Category Archives: System administration

Discourse on Debian Wheezy via Docker

Introduction

The makers of the StackExchange network have been working on Discourse for quite a while now. It is a modern communication platform, with the goal to be a nice mixture of classical internet forums, mailing lists, and established social media features:

Discourse is a from-scratch reboot, an attempt to reimagine what a modern, sustainable, fully open-source Internet discussion platform should be today – both from a technology standpoint and a sociology standpoint.

Technically, Discourse is based on various single components with Ruby on Rails at its core, thin as a leightweight web server, Redis for caching and job control, Sidekiq for micro job management, and PostgreSQL as a persistent database backend. Likewise, deploying Discourse right from the git repository can be a time-consuming task, not to mention the implications of maintaining a Discourse instance or running multiple Discourse instances on the same host. Discourse realized that the complicated deployment certainly prevents a couple of people from using it. Fortunately, the awesome “leightweight VM” container system Docker became production-ready in June 2014:

This release’s “1.0” label signifies a level of quality, feature completeness, backward compatibility and API stability to meet enterprise IT standards. In addition, to provide a full solution for using Docker in production we’re also delivering complete documentation, training programs, professional services, and enterprise support.

… and Discourse consequently decided to work on a Docker-based deployment system, which is the default by now, and even the only supported method (as you can see, Discourse evolves quite quickly). However, Docker is officially supported only on very modern Linux distributions, because Docker’s libcontainer as well as AuFS require kernel features that were only recently introduced. Consequently, Docker (and with that Discourse) is not supported on e.g. Debian 7 (Wheezy), which is the current “stable” Debian as sysadmins love it. But there is something we can do about it.

Although not officially supported or recommended, Docker and Discourse can be run on Debian 7 by backporting a more modern Linux kernel. Currently, Wheezy comes with a kernel from the 3.2 branch. The wheezy-backports (or Debian 8) kernel currently is from the 3.14 branch, which is new enough. Is using a kernel from backports safe? I can only tell that my system runs without quirks, and as you will find on the web, a couple of other people are also running successfully with a kernel from backports.

Hence, if you want to deploy Discourse on Wheezy, you can do it, and it will work perfectly fine. I will quickly go through the required steps in the next sections.

Using a kernel from backports

Follow the official instructions for using the backports repository:

# echo "deb http://ftp.de.debian.org/debian wheezy-backports main" > \
    /etc/apt/sources.list.d/wheezy-backports.list
# apt-get update

Get the newer kernel:

# apt-get -t wheezy-backports install linux-image-amd64 linux-headers-amd64

Reboot:

# reboot

Verify:

$ uname -a
Linux gurke2 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 (2014-07-13) x86_64 GNU/Linux

Install Docker

Following the official installation instructions for Ubuntu (which Debian is very close to):

# apt-get install apt-transport-https
# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
# sh -c "echo deb https://get.docker.io/ubuntu docker main \
   > /etc/apt/sources.list.d/docker.list"
# apt-get update
# apt-get install lxc-docker

Install Discourse ecosystem

Following the instructions from here and here:

# install -g docker -m 2775 -d /var/docker
# adduser discourseuser
# usermod -a -G docker discourseuser

Now, as user discourseuser:

$ git clone https://github.com/discourse/discourse_docker.git /var/docker

Configure Discourse app

As user discourseuser:

$ cd /var/docker
$ cp samples/standalone.yml containers/app.yml

Edit app.yml to your needs. In my case, I made the Discourse container’s web server and SSH server not listen on an external interface, by changing the expose section to:

expose:
  - "127.0.0.1:8200:80"
  - "127.0.0.1:2222:22"

That means that web and SSH servers of the container are reachable through localhost, but not from outside. This has important security advantages:

  • We can proxy the HTTP traffic of the Discourse web application through a web server running on the host (e.g. nginx), and encrypt the connection (the Discourse Docker container does not have TLS support built-in, duh).
  • Why would you want to expose the Discourse container via SSH to the internet, anyway? That’s a severe mistake, in my opinion, and you should disable that unless you have a very good reason not to.

I have set DISCOURSE_SMTP_ADDRESS: 172.17.42.1, as 172.17.42.1 is the IP address of the host in the private virtual Docker network in my case. On my Debian host, I am running an Exim4 MTA, which is configured with dc_local_interfaces='172.17.42.1' and dc_relay_nets='172.17.0.0/16', i.e. it listens on the local Docker network and relays mails incoming from that network. That way, the Discourse instance running within a Docker container can send mail through the Exim MTA running on the Debian host. I have described that in more detail in another blog post.

Each change of the configuration file app.yml requires a re-build via ./launcher rebuild app. A re-build implicates stopping, destructing, bootstrapping, and launching the container. What I have learned just recently: don’t worry too much about the word “destruction”. It does not implicate data loss. You can re-build your Discourse container at any time. Persistent data is stored in the PostgreSQL database and in the shared volume, both of which are not affected by a re-build.

Hope that helps!

Discourse Docker container: send mail through Exim

Introduction

The Discourse deployment was greatly simplified by introducing Docker support (as I have written about before). Discourse heavily depends on e-mail, and its ability to send mail to arbitrary recipients is essential. While the recommended way is to use an external service like Mandrill, it is also possible to use a local MTA, such as Exim. However, when you set up the vanilla Discourse Docker container, it does not contain an pre-configured MTA, which is fine, since many have a well-configured MTA running on the host already. The question is how to use that MTA for letting Discourse send mail.

Usually, MTAs on smaller machines are configured to listen on localhost only, to not be exposed to the Internet and to not be mis-used for spam. localhost on the host itself, however, is different from localhost within a Docker container. The network within the container is a virtual one, and it is cleanly separated from the host. That is, when Discourse running in a container tries to reach an SMTP server on localhost, it cannot reach an MTA listening on localhost outside of the container. There is a straight-forward solution: Docker comes along with a network bridge. In fact, it provides a private network (in the 172.17.x.x range) that connects single containers with the host. This network can be used for establishing connectivity between a network application within a Docker container and the host.

Exim’s network configuration

Likewise, I have set up Exim4 on the Debian host for relaying mails that are incoming from localhost or from the local virtual Docker network. First I looked up the IP address of the docker bridge on the host, being 172.17.42.1 in my case (got that from /sbin/ifconfig). I then instructed Exim to treat this as local interface and listen on it. Also, Exim was explicitly told to relay mail incoming from the subnet 172.17.0.0/16, otherwise it would reject incoming mails from that network. These are the relevant keys in /etc/exim4/update-exim4.conf.conf:

dc_local_interfaces='127.0.0.1;172.17.42.1'
dc_relay_nets='172.17.0.0/16'

The config update is in place after calling update-exim4.conf and restarting Exim via service exim4 restart.

Testing SMTP access from within container

I tested if Exim’s SMTP server can be reached from within the container. I used the bare-bones SMTP implementation of Python’s smtplib for that. First of all, I SSHd into the container by calling launcher ssh app. I then called python. The following Python session demonstrates how I attempted to establish an SMTP connection right to the host via its IP address in Docker’s private network:

>>> import smtplib
>>> server = smtplib.SMTP('172.17.42.1')
>>> server.set_debuglevel(1)
>>> server.sendmail("rofl@gehrcke.de", "jgehrcke@googlemail.com", "test")
send: 'ehlo [172.17.0.54]\r\n'
reply: '250-localhost Hello [172.17.0.54] [172.17.0.54]\r\n'
reply: '250-SIZE 52428800\r\n'
reply: '250-8BITMIME\r\n'
reply: '250-PIPELINING\r\n'
reply: '250 HELP\r\n'
reply: retcode (250); Msg: localhost Hello [172.17.0.54] [172.17.0.54]
SIZE 52428800
8BITMIME
PIPELINING
HELP
send: 'mail FROM:<rofl@gehrcke.de> size=4\r\n'
reply: '250 OK\r\n'
reply: retcode (250); Msg: OK
send: 'rcpt TO:<jgehrcke@googlemail.com>\r\n'
reply: '250 Accepted\r\n'
reply: retcode (250); Msg: Accepted
send: 'data\r\n'
reply: '354 Enter message, ending with "." on a line by itself\r\n'
reply: retcode (354); Msg: Enter message, ending with "." on a line by itself
data: (354, 'Enter message, ending with "." on a line by itself')
send: 'test\r\n.\r\n'
reply: '250 OK id=1X9bpF-0000st-Od\r\n'
reply: retcode (250); Msg: OK id=1X9bpF-0000st-Od
data: (250, 'OK id=1X9bpF-0000st-Od')
{}

Indeed, the mail arrived at my Google Mail account. This test shows that the Exim4 server running on the host is reachable via SMTP from within the Discourse Docker instance. Until I got the configuration right, I observed essentially two different classes of errors:

  • socket.error: [Errno 111] Connection refused in case there is no proper network routing or connectivity established.
  • smtplib.SMTPRecipientsRefused: {'jgehrcke@googlemail.com': (550, 'relay not permitted')} in case the Exim4 SMTP server is reachable, but rejecting your mail (for this to solve I had to add the dc_relay_nets='172.17.0.0/16' to the config shown above).

Obviously, in order to make Discourse use that SMTP server, it needs to be configured with DISCOURSE_SMTP_ADDRESS being set to the IP address of the host in the Docker network, i.e. 172.17.42.1 in my case.

Hope that helps!

Good to know checkrestart from debian-goodies

The security audits triggered by Heartbleed lead to an increasing discovery of security issues in libraries such as GnuTLS and OpenSSL within the past weeks. These issues were dealt with responsibly, i.e. they were usually published together with the corresponding binary updates by major Linux distributions such as Debian (subscribing to debian-security-announce or comparable sources is highly recommended if you want to keep track of these developments).

After downloading an updated binary of a shared library, it is important to restart all services (processes) that are linked against this library in order to make the update take effect. Usually, processes load a shared library code segment to random access memory once during startup (actually, the program loader / runtime linker does this), and do not reload that code afterwards throughout their life time, which may be weeks or months in case of server processes. Prime examples are Nginx/Apache/Exim being linked against OpenSSL or GnuTLS: if you update the latter but do not restart the former, you have changed your disk contents but not updated your service. In the worst case the system is still vulnerable. It should therefore be the habit of a good system administrator to question which services are using a certain shared library and to restart them after a shared library update, if required.

There is a neat helper utility in Debian, called checkrestart. It comes with the debian-goodies package. First of all, what is debian-goodies? Let’s see (quote from Wheezy’s package description):

Small toolbox-style utilities for Debian systems
 
These programs are designed to integrate with standard shell tools, extending them to operate on the Debian packaging system.
 
 dgrep  - Search all files in specified packages for a regex
 dglob  - Generate a list of package names which match a pattern
 
These are also included, because they are useful and don't justify their own packages:
 
 debget          - Fetch a .deb for a package in APT's database
 dpigs           - Show which installed packages occupy the most space
 debman          - Easily view man pages from a binary .deb without extracting
 debmany         - Select manpages of installed or uninstalled packages
 checkrestart    - Help to find and restart processes which are using old
                   versions of upgraded files (such as libraries)
 popbugs         - Display a customized release-critical bug list based on
                   packages you use (using popularity-contest data)
 which-pkg-broke - find which package might have broken another

Checkrestart is a Python application wrapping lsof (“list open files”). It tries to identify files used by processes that are not in the file system anymore. How so?

Note that during an update a certain binary file becomes replaced: the new version is first downloaded to disk and then rename()ed in order to overwrite the original. During POSIX rename() the old file becomes deleted. But the old file is still in use! The standard says that if any process still has a file open during its deletion, that file will remain “in existence” until the last file descriptor referring to it is closed. While these files that are still held “in existence” for running processes by the operating system, they are not listed in the file system anymore. They can however easily be identified via the lsof tool. And this is exactly what checkrestart does.

Hence, checkrestart “compares” the open files used by running processes to the corresponding files in the file system. If the file system contains other (e.g. newer) data than the process is currently using, then checkrestart proposes to restart that process. In a tidy server environment, this usually is the case only for updated shared library files. Below you can find example output after updating Java, Python, OpenSSL, and GnuTLS:

# checkrestart
Found 12 processes using old versions of upgraded files
(5 distinct programs)
(5 distinct packages)
 
Of these, 3 seem to contain init scripts which can be used to restart them:
The following packages seem to have init scripts that could be used
to restart them:
nginx-extras:
    20534   /usr/sbin/nginx
    20533   /usr/sbin/nginx
    20532   /usr/sbin/nginx
    19113   /usr/sbin/nginx
openssh-server:
    3124    /usr/sbin/sshd
    22964   /usr/sbin/sshd
    25724   /usr/sbin/sshd
    22953   /usr/sbin/sshd
    25719   /usr/sbin/sshd
exim4-daemon-light:
    3538    /usr/sbin/exim4
 
These are the init scripts:
service nginx restart
service ssh restart
service exim4 restart
 
These processes do not seem to have an associated init script to restart them:
python2.7-minimal:
    2548    /usr/bin/python2.7
openjdk-7-jre-headless:amd64:
    4348    /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java

Nginx (web server) and OpenSSH (SSH server) are linked against OpenSSL, and Exim (mail transfer agent) is linked against GnuTLS. So that output makes sense. Obviously, after an update of Python and Java processes using this interpreter or VM (respectively) also need to be restarted in order to use the new code. Checkrestart is extremely helpful, thanks for this nice tool.

Update 2014-09-03:

With the upcoming Debian 8 (Jessie, currently unstable), a new package has been introduced — needrestart. It is a more modern version of checkrestart, and tightly integrates with the systemd service infrastructure. Notably, needrestart comes in its own package and with the goal to become more popular than checkrestart (which was hidden in debian-goodies, as discussed above). Currently, a discussion is going on on the debian-security mailing list about installing and running needrestart in a default Debian installation.

Thin out your ZFS snapshot collection with timegaps

Recently, I have released timegaps, a command line tool for — among others — implementing backup retention policies. In this article I demonstrate how timegaps can be applied for filtering ZFS snapshots, i.e. for identifying those snapshots that can be deleted according to a certain backup retention policy.

Start by listing the names of the snapshots (in my case of the usbbackup/synctargets dataset):

$ zfs list -r -H -t snapshot -o name usbbackup/synctargets
usbbackup/synctargets@20140112-191516
usbbackup/synctargets@20140112-193913
usbbackup/synctargets@20140112-195055
usbbackup/synctargets@20140112-195320
usbbackup/synctargets@20140113-170032
[...]
usbbackup/synctargets@20140327-174655
usbbackup/synctargets@20140328-141257
usbbackup/synctargets@20140330-183001
usbbackup/synctargets@20140331-172543
usbbackup/synctargets@20140401-175059
usbbackup/synctargets@20140402-180042

As you can see, I have encoded the snapshot creation time in the snapshot name. This is prerequisite for the method presented here.

In the following command line, we provide this list of snapshot names to timegaps — via stdin. We advise timegaps to keep the following snapshots:

  • one recent snapshot (i.e. younger than 1 hour)
  • one snapshot for each of the last 10 hours
  • one snapshot for each of the last 30 days
  • one snapshot for each of the last 12 weeks
  • one snapshot for each of the last 14 months
  • one snapshot for each of the last 3 years

… and to print the other ones — the rejected ones — to stdout. This is the command line:

$ zfs list -r -H -t snapshot -o name usbbackup/synctargets | timegaps \
      --stdin --time-from-string 'usbbackup/synctargets@%Y%m%d-%H%M%S' \
      recent1,hours10,days30,weeks12,months14,years3

As you can see, the rules are provided to timegaps via the argument string recent1,hours10,days30,weeks12,months14,years3. The switch --time-from-string 'usbbackup/synctargets@%Y%m%d-%H%M%S' informs timegaps about how to parse the snapshot creation time from a snapshot name. Obviously, --stdin advises timegaps to read items from stdin (instead of from the command line, which would be the default).

See it in action:

$ zfs list -r -H -t snapshot -o name usbbackup/synctargets | timegaps \
      --stdin --time-from-string 'usbbackup/synctargets@%Y%m%d-%H%M%S' \
      recent1,hours10,days30,weeks12,months14,years3
usbbackup/synctargets@20140227-180824
usbbackup/synctargets@20140228-201639
usbbackup/synctargets@20140301-180728
[...]
usbbackup/synctargets@20140313-235809

You don’t really see the difference here because I cropped the output. The following is proof that (for my data) timegaps decided (according to the rules) that 41 of 73 snapshots are to be rejected:

$ zfs list -r -H -t snapshot -o name usbbackup/synctargets | wc -l
73
$ zfs list -r -H -t snapshot -o name usbbackup/synctargets | timegaps \
    --stdin --time-from-string 'usbbackup/synctargets@%Y%m%d-%H%M%S' \
    recent1,hours10,days30,weeks12,months14,years3 | wc -l
41

That command line can easily be extended for creating a little script for actually deleting these snapshots. sed is useful here, for prepending the string 'zfs destroy ' to each output line (each line corresponds to one rejected snapshot):

$ zfs list -r -H -t snapshot -o name usbbackup/synctargets | timegaps \
    --stdin --time-from-string 'usbbackup/synctargets@%Y%m%d-%H%M%S' \
    recent1,hours10,days30,weeks12,months14,years3 | \
    sed 's/^/zfs destroy /' > destroy_snapshots.sh
$ cat destroy_snapshots.sh
zfs destroy usbbackup/synctargets@20140227-180824
zfs destroy usbbackup/synctargets@20140228-201639
[...]
zfs destroy usbbackup/synctargets@20140325-215800
zfs destroy usbbackup/synctargets@20140313-235809

Timegaps is well tested via unit tests, and I use it in production. However, at the time of writing, I have not gotten any feedback from others. Therefore, please review destroy_snapshots.sh and see if it makes sense. Only then execute.

I expect this post to raise some questions, regarding data safety in general and possibly regarding the synchronization between snapshot creation and deletion. I would very much appreciate to receive questions and feedback in the comments section below, thanks.

Presenting timegaps, a tool for thinning out your data

I have released timegaps, a command line program for sorting a set of items into rejected and accepted ones, based on the age of each item and user-given time categorization rules. While this general description sounds quite abstract, the concept is simple to grasp considering timegaps’ main use case (quote from the readme file):

Timegaps allows for thinning out a collection of items, whereas the time gaps between accepted items become larger with increasing age of items. This is useful for keeping backups “logarithmically” distributed in time, e.g. one for each of the last 24 hours, one for each of the last 30 days, one for each of the last 8 weeks, and so on.

A word in advance: I would very much appreciate to receive your feedback on timegaps. And, if you like it, spread it — Thanks!

Motivation: simple implementation of backup retention policies

Backup strategies must be very well thought through. An important question is at which point old backups are to be deleted or, in other words, which old backups are to be kept for how long. This is generally implemented as a so-called data retention policy — a quote from Wikipedia (Backup):

The secondary purpose of backups is to recover data from an earlier time, according to a user-defined data retention policy, typically configured within a backup application for how long copies of data are required.

Why is the implementation of such a policy important? Obviously, storing all periodically (e.g. daily) created snapshots wastes valuable storage space. A backup retention policy allows to precisely determine which snapshots will be kept for how long. It allows users to find a trade-off between data restoration needs and the cost of backup storage.

People usually implement an automatic backup solution which takes periodic snapshots/backups of a certain data repository. Additionally, unless the data is very small compared to the available backup space, the user has to implement a retention policy which automatically deletes old backups. At this point, people unfortunately tend to take the simplest possible approach and automatically delete snapshots older than X days. This is easily implemented using standard command line tools. However, a clearly more sophisticated and safer backup retention strategy is to also keep very old backups, just not all of them.

An obvious solution is to retain backups “logarithmically” distributed in time. The well-established backup solution rsnapshot does this. It creates a structure of hourly / daily / weekly / ... snapshots on the fly. Unfortunately, other backup approaches often lack such a fine-grained logic for eliminating old backups, and people tend to hack simple filters themselves. Furthermore, even rsnapshot is not able to post-process and thin out an existing set of snapshots. This is where timegaps comes in: you can use the backup solution of your choice for periodically (e.g. hourly) creating a snapshot. You can then — independently and at any time — process this set of snapshots with timegaps and identify those snapshots that need to be eliminated (removed or displaced) in order to maintain a certain “logarithmic” distribution of snapshots in time. This is the main motivation behind timegaps, but of course you can use it for filtering any kind of time-dependent data.

Usage example

Consider the following situation: all *.tar.gz files in the current working directory happen to be daily snapshots of something. The task is to accept one snapshot for each of the last 20 days, one for each for the last 8 weeks, and one for each of the last 12 months, and to move all others to the directory notneededanymore. Using timegaps, this is a simple task:

$ mkdir notneededanymore
$ timegaps --move notneededanymore days20,weeks8,months12 *.tar.gz

Done.

Design goals and development notes

Timegaps aims to be a slick, simple, reliable command line tool — ready to be applied in serious system administration work flows that actually touch data. It follows the Unix philosophy, has a well-defined command line interface, and well-defined behavior with respect to stdin, stdout, stderr and its exit code, so I expect it to be applied in combination with other command line tools such as find. You should head over to the project page for seeing more usage examples and a detailed specification.

The timegaps Python code runs on both, Unix and Windows as well as on both, Python 2 and 3. The same code base is used in all environments, so no automatic 2to3 conversion is involved. I undertook some efforts to make the program support unicode command line arguments on Windows at the same time as byte string paths on Unix, so I am pretty sure that timegaps works well with all kinds of exotic characters in your file names. The program respects the PYTHONIOENCODING environment variable when reading items from stdin and when writing items to stdout. That way, the user has the definite control over item de- and encoding.

For general quality assurance and testing the stability of behavior, timegaps is continuously checked against two classes of unit tests:

  • API tests, testing internally used functionality, such as the time categorization logic. Some tests are fed with huge random input data sets, and the output is checked against what is statistically expected.

  • Command line interface (CLI) tests, testing the program from the user’s perspective. To that end, I have started implementing a Python CLI testing framework, clitest.py. Currently, it is included in the timegaps code repository. At some point, I will probably create an independent open source project from that.

Resources