We just experienced one of the servers in our cluster stop serving webpages. Customers accessing this server would have been unable to access the site (blog, forum, and the main site), the webserver refused to serve any pages, so would have seen no page returned at all.
Our support team couldn’t find any obvious issue and restarted the service. Total outage of service was about 30 mins, and only affected customers accessing that particular server, customers accessing another server were unaffected.
We suspect this issue is related to moving across to the new platform, so we’ll try and collate more information about the cause of failure if we see it happen again.
UPDATE : We’ve added some extra service monitoring so that the webserver will automatically restart if this condition persists for more than a few minutes.