Author Archives: dougfolksy

Brief outage at 15:55

Hello — we’ve had a brief outage at 15:55, sorry, which we fixed within 5 minutes or so.

One of our key servers had run out of memory. I’m looking into why we didn’t get alerted when it was getting full.

We’re sorry for any inconvenience.

Doug.

Site issues, in maintenance mode — we are currently investigating

Hello — sorry that we’re having issues at the moment, please be assured that we are investigating them and trying to get the site back up.

Doug.

**UPDATE:** The site is back up. Our search index provider had a service outage and I’m waiting to hear back the precise details of that. In the meantime, I’m preparing an alternative service to switch to in case there are any other issues and of course I’m continually monitoring things. I’m so sorry to anybody who was inconvenienced. Doug.

Tonight’s maintenance mode delayed slightly, will now be after 3am

Hello — the maintenance mode planned for between 2 and 3am has been slightly delayed and will take place after 3.

Thanks,

Doug.

Maintenance mode sometime between 2am and 3am this morning

Hi — we have updates we need to perform to some of our underlying hardware in the wee hours this morning.

This will require us to go into maintenance mode for a short while, hopefully around 15 minutes or even less.

We aim to start the work at around 2am, although it might be delayed for a short while depending on whether some automated tasks have completed or not.

I’ll keep you posted as to progress, here, and thank you so much for your patience whilst we undertake the work.

Thank you,

Doug.

Maintenance mode this morning around 02:00

Hi — we have some significant backend updates to push this morning which will require us to go briefly into maintenance mode.

This shouldn’t be for a period of more than 10 minutes or so. We’ll let you know here if there are any changes to that estimate.

Thanks so much for bearing with us whilst we run these.

Doug.

15 minute outage just now

Hello — we just had an outage for around 15 minutes.

The cause was me running an intensive background system update that, with the site being busy, tipped one of our key servers over.

I’m really sorry about that.

I got the server back up in about 10 minutes and things were back to normal 5 minutes after that. Naturally, I’ll be keeping an eye on things to make sure we’re all good.

Once again, I’m sorry about that, I didn’t realise just how intensive the update was. I’ll reschedule it for running in the wee hours when we’re quieter.

Thanks,

Doug.

Image issues this morning

Hi — just letting you know that we’re on with fixing the images issue we’re experiencing. I’ll update here when we know more.

Thanks so much for bearing with us.

Doug.

UPDATE: The server that runs our image cache fell over. I’ve rebuilt and replaced it and the system seems to be working again, now, although obviously I’ll keep monitoring it. We’re now investigating what happened to cause the old server to fall over.

UPDATE 2: We’ve had notification that the original server had hardware degradation issues which explains things. Whilst very rare, these sorts of things will happen from time to time in complex systems. The silver lining in this incident is that we were able to get a whole new server in place and running within 30 minutes of the first alert. My sincere apologies to anyone who was inconvenienced by the outage and my thanks for everybody’s patience whilst we resolved this.

Doug.

5 minute outage just now

Hi — sorry about the brief outage just now. It was due to a bad code merge making it into a deploy. Ordinarily this can’t happen but I mistakenly thought the changes I’d made didn’t affect any production code and took a shortcut.

I’ve rolled the changes back and am fixing the issue, now.

Again, my apologies for that.

Doug.

Outages this afternoon

Hello — we had a few outages of lengths varying from 5 minutes to 40 this afternoon.

As you can imagine, we were madly scrambling to find and fix the errors and we’re happy that we have now done this.

It turns out that we had an intermittent issue with our search index: once we fixed this, everything started working again.

I’m going to keep monitoring the site this evening to make sure all is and stays well.

My first job for tomorrow morning is to write a script that will monitor for this sort of error and fix it as soon as it occurs in the future. I don’t anticipate that it will very often because it’s not one we’ve seen before but I want to be able to tell you all with confidence that this particular issue won’t take the site down again.

I apologise to everyone who was inconvenienced by this afternoon’s outages.

I’m also sincerely grateful to you all for your patience with us whilst we investigated and fixed it.

Thank you,

Doug.

Maintenance mode at 5am today

Hi — we’ve got some work to deploy that requires we rebuild our search indices this morning.

This will require us to go into maintenance mode for a while. We’re going to try to keep it to under half an hour.

The indices will be rebuilding for around an hour or so after that, which means that, for a short while, not all items will appear in all pages.

Thanks for bearing with us whilst we undertake this work and I hope you’re all doing well.

Doug.