I always get the error page I can still access a topic from my history but I can't see a list of topics anymore since yesterday.
You're not alone.
Is it Europe only or all around the world?
seems like the site has been more buggy to me since the week before build.
or it it just me ?
me as well
I'll be working out in the meantime:
Sorry everyone!! we are loooking into the problem now! I'll keep you all posted.
It's up and running
Was it a DoS attack?
@figuerres: I love you guys, always want to know the details
Ok, so here is what happened
- SQL Data Sync is what we use to move data around the world to our different data centers
- You can't have circular references in tables that Data Sync is going to replicate, because it replicates a table at a time, not log shipping or other transaction-by-transaction style of replication
- We removed some foreign key references between Thread and Post (thread has a property 'last post ID', which is a denormalized reference to the most recent post so that we can quickly grab the date and commenter for the thread view) and between Forum and Post (forum also has a 'last post id' property for the same reason)
- We have a feature for admins that deletes all the comments from a given user (useful after a spammer has added comments in 20 threads on the site)
- That feature was written very naively, it would fail if a given post/comment had any references on it, which was basically only if it was the last comment in a thread or the last comment in the entire forum.
- Before we implemented data sync, that would result in an error in the admin UI (we were unable to delete all the users comments)
- Deleting a single comment right in the forum doesn't have these issues, so we weren't rushing to fix the 'delete all comments' feature
- Now that the foreign key restraints are removed between those tables, it doesn't error out and instead leaves the Forum record and the Thread record in a broken state *if* the comment being deleted is the very last one in a forum or in a thread.
- When we go to retrieve that Thread or Forum, the ORM tries to load up the associated child entity using the key, which now points to nothing, and throws a 'not found' exception.
What we have done and what we are doing to fix:
- First thing we did, which gets the site back but is definitely only a temporary fix, was to update the offending rows manually
- Next, we changed the handler for when an entity id is not found in the database to not throw an exception, but to just log that error and treat the result as if the key was null.
- Finally, we are rewriting the bulk deletion tool to correctly update the related tables as it removes comments.
Thanks Duncan !
Thats actually very interesting stuff for me as i take care of several fair sized databases and we always worry when we have to modify them about having some side effect that we did not forsee at the time.
and in my work if i munge up the wrong data it can create real problems as most of my stuff is tracking money (orders, voids, credits all that accounting stuff).
@figuerres:well the problem here is that we *have* key relationships defined, which should expose a programming mistake as an exception (not the preferred way to find them, but at least our database integrity is maintained), but the need to remove them has messed us up. I understand *why* they had to go, trying to sync a database with a circular reference would be very tricky, but now we need to essentially duplicate that same constraint through code.
@Duncanma: Facing a similar problem at work where I will need to go back to data beyond a certain date and update records on a per user basis and is very error prone as there is no real way to test it except on the live system.
I will re-read this thread when I am about to get started.