Outage due to database issues this evening

We had a pretty serious outage this evening.
We were working constantly to identify and fix the error as soon as it reared it’s head. Once we’d located the root cause in the database I was on the phone with the engineer from our database service provider and he helped me fix it.
Basically, the database was running low on available memory. It isn’t something we’ve seen before so it took a while and some help from a support database engineer to identify and come up with a solution. We now know to go in and clear out some memory if this should happen again and also what to keep an eye on to stop it becoming a serious issue in the future. We’re also looking at upgrading the database so that we’ll have more leeway with memory, generally.
We hit a tipping point with the low memory tonight, which is why we saw a sudden rash of issues all across the site. However, going back through the database stats, it looks like it’s been struggling with memory periodically for a while. This would definitely have been a contributing factor to the occasional heavy load errors that have been reported, so hopefully the work we’ve done tonight on the database (along with the other optimisation work we’ve been doing over the last few weeks) will eliminate those.
Obviously, it’s something we’ll be keeping a very close eye on.
I hope that explains the issues we’ve seen this evening and our efforts to resolve them as quickly as possible. I’m really sorry to everyone who was affected by the outage and I hope this post offers some assurance that we’ve identified and fixed the cause of it.
Thank you so much for your patience whilst we fixed the issue,
Doug.

%d bloggers like this: