We had some problems this morning as a result of the scheduled maintenance that was carried out in the early hours of the morning. We experienced slow service and some outages to service from about 7am through to about 11am.
The scheduled maintenance involved some infrastructure changes, migrating the forums to a dedicated database cluster as well as the main folksy site, as we’d had some problems with performance of the forums under the old servers. This maintenance was deemed as unsuccessful and we migrated back to the previous database server.
At around 7am this morning the support team started receiving monitoring alerts indicating slow site performance. The problem was related to the database migration(s) carried out earlier in the morning. Our hosting company was contacted and we performed some database optimization tasks in order to try and reduce the poor query performance on the database host. This was successful, but some queries were still causing problems, which lead to the decision to remove the tag based navigation for practices and materials in the navigation column (below the categories navigation). The item view counts had also previously been disabled in the earlier migration as they placed a heavy load on the database. These changes brought the database performance back to acceptable levels.
A small number of customers have experienced a problem with their shops where the items weren’t appearing, they were however available from the main folksy index pages. This was due to a bug with the new database cluster. This has now been upgraded and the problem appears to have been eliminated. Unfortunately this required a 10 minute outage to service whilst the upgrade was performed (from approximately 6.30pm to 6.40pm).
We intend to enable the item view counters again tomorrow morning, providing the service experiences no further problems through this evening. The tag navigation is currently being rewritten and we’d expect that to be reinstated in the next fortnight, again providing the service remains stable.
Apologies to all our customers for the problems this has caused, we’re working hard to ensure these problems don’t occur again in the future.