Up again

I had some bad luck. Last week, my hosting provider performed a previously announced routine maintenance with the server running this website. During this maintenance, they managed to destroy the RAID (thanks, guys). All data lost. I did not have a very recent remote system snapshot and therefore needed to re-create the entire system from scratch.

A couple of months ago already, I have put Cloudflare in front of my website, so first I thought: hey, that gives me some time to restore things while the incident is largely transparent to the visitors of my website. I had activated Cloudflare’s “always online” feature months ago. That’s basically just a cache that jumps in when the backend is down. But wait, what does “down” mean? Cloudflare says that “always on” is triggered when the backend sends a 502 or 504 type response. Currently, when the backend is just dead (not responsive), this cache does not serve anything at all (in a support ticket, they said that they will “add this” in the future). My server went down at night and by the time I had a system and web server running again (and also sending a 502 response in order to trigger Cloudflare’s “always online”) — that was about 10 hours after the machine went down — the Cloudflare cache already assumed that my website does not exist anymore. Pah, “always online” useless. So, I am sorry, the downtime lasted quite long. I had to put all components back together manually.

Good thing is that I had up-to-date remote backups of my WordPress database (via Dropbox), my nginx config and other configs (via Bitbucket), and a more or less up-to-date remote backup of the directory structure behind my website. While settings things up again, I used the opportunity for re-structuring my website a bit. I am now running a modified version of the WordPress’ TwentyTwelve theme, for a cleaner appearance and a certain responsiveness.

A few hours before all data was lost, I wrote another blog post. That one was not contained in the latest (daily) WordPress database backup performed before the crash. When I realized that, the first thing I did was cp -a ing my Firefox and Chrome browser caches on the machine where I was writing that post. I then started digging in these caches in order to find residual pieces of the article’s content. And I found a golden piece. Chrome had cached a gzipped HTTP response containing the final version of my article, found via the chrome://cache/ list. Chrome displays the contents in a hexdump -C fashion. I copied this text, used a Python script to parse the dump, re-create the binary data, and to unzip this data. Based on the resulting HTML view of my article, I could quickly add it again to the WordPress system.

Some lessons learned:

  • Don’t trust scheduled routine maintenance, perform a backup *right* before that.
  • Cloudflare cache does not help in the situation where you need it most (that’s the current situation, at least).
  • Caches can still be of essential help in such a worst-case scenario. I recovered an article from the browser cache. I use the Google cache in order to see if something is still missing on the new version of my website
  • Google Webmastertools are pretty convenient in the sense that they inform you about crawling errors — I frequently check their interface and realize that there are still some missing pieces of my web presence (files, mostly).
  • Using (remote) code repositories for configuration stuff is the best you can do. First of all, it’s perfect bookkeeping if done properly. Secondly, if regularly pushed, it’s the best configuration backup you can have.

If you are still missing a file here and there or find some dead links, please don’t hesitate to notify me. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? Please fill this out: * Time limit is exhausted. Please reload CAPTCHA.