Category Archives: Python

How to set up a 64 bit version of NumPy on Windows

A short note on a complex topic. Feel free to shoot questions at me in the comments.

There are no official NumPy 64 bit builds available for Windows. In fact, 64 bit Windows is not officially supported by NumPy. So, if you are serious about your project, you need to either consider building on top of Unix-like platforms and inherit external quality assurance, or (on Windows) you need to anticipate issues of various kinds, and do extensive testing on your own. One of the reasons is that there is no adequate (open source, reliable, feature-rich) tool chain for creating proper 64 bit builds of NumPy on Windows (further references: numpy mailing list thread, Intel forums). Nevertheless, in many cases a working solution are the non-official builds provided by Christoph Gohlke, created with Intel’s commercial compiler suite. It is up to you to understand the license impacts and whether you want or can use these builds. I love to use these builds.

The following steps show a very simple way to get NumPy binaries for the AMD64 architecture installed on top of CPython 3(.4). These instructions are valid only for Python installed with an official CPython installer, obtained from

1) Install CPython for AMD64 arch

Download a 64 bit MSI installer file from The crucial step is to get an installer for the AMD64 (x86-64) architecture, usually called “Windows x86-64 MSI installer”. I have chosen python-3.4.2.amd64.msi. Run the setup.

2) Upgrade pip

Recent versions of Python 3 ship with pip, but you should use the newest version for proper wheel support. Open cmd.exe, and run

C:\> pip install pip --upgrade


C:\> pip --version
pip 6.0.8 from C:\Python34\lib\site-packages (python 3.4)

The latter verifies that this pip i) is up-to-date, and ii) belongs to our target CPython version (multiple versions of CPython can be installed on any given system, and the correspondence between pip and a certain Python build is sometimes not obvious).

Note: The CPython installer should properly adjust your PATH environment variable so that python as well as pip entered at the command line correspond to what has been installed by the installer. It is however possible that you have somehow lost control of your environment by installing too many different things in an unreasonable order. In that case, you might have to manually adjust your PATH so that it priorizes the exetuables in C:\Python34\Scripts (or wherever you have installed your 64 bit Python version to).

3) Download wheel of NumPy build for AMD64 on Windows

Navigate to and select a build for your Python version and for AMD64. I chose numpy‑1.9.2rc1+mkl‑cp34‑none‑win_amd64.whl.

4) Install the wheel via pip

On the command line, navigate to the directory where you have downloaded the wheel file to. Install it:

C:\Users\user\Desktop>pip install "numpy-1.9.2rc1+mkl-cp34-none-win_amd64.whl"
Processing c:\users\user\desktop\numpy-1.9.2rc1+mkl-cp34-none-win_amd64.whl
Installing collected packages: numpy
Successfully installed numpy-1.9.2rc1

The simplicity of this approach is kind of new. Actually, this simplicity is why wheels have been designed in the first place! Installing pre-built binaries with pip has not been possible with the “old” egg package format. So, older tutorials/descriptions of this kind might point to MSI installers or dubious self-extracting installers. These times are over now, and this is also the main reason for why I am writing this blog post.

5) Verify

>>> import numpy
>>> numpy.__version__


Third-party Python distributions

I do not want to leave unmentioned that out there are very nice third party Python distributions (i.e. not provided by the Python Software Foundation) that include commercially supported and properly tested NumPy/SciPy builds for 64 bit Windows platforms. Most of these third party vendors have a commercial background, and dictate their own licenses with respect to the usage of their Python distribution. For non-commercial purposes, most of them can be used for free. The following distributions provide a working solution:

All of these three distributions are recommendable from a technical point of view (I cannot tell whether their license models / restrictions are an issue for you or not). They all come as 64 bit builds. I am not entirely sure if Enthought and ActiveState build NumPy against Intel’s Math Kernel Library. In case of Anaconda, this definitely is not the case in the free version — this is something that can be explicitly obtained, for 29 $ (it’s called the “MKL Optimizations” package).

Songkick events for Google’s Knowledge Graph

Google can display upcoming concert events in the Knowledge Graph of musical artists (as announced in March 2014). This is a great feature, and probably many people in the field of music marketing and especially record labels aim to get this kind of data into the Knowledge Graph for their artists. However, Google does not magically find this data on its own. It needs to be informed, with a special kind of data structure (in the recently standardized JSON-LD format) contained within the artist’s website.

While of great interest to record labels, finding a proper technical solution to create and provide this data to Google still might be a challenge. I have prepared a web service that greatly simplifies the process of generating the required data structure. It pulls concert data from Songkick and translates them into the JSON-LD representation as required by Google. In the next section I explain the process by means of an example.

Web service usage example

The concert data of the band Milky Chance is published and maintained via Songkick, a service that many artists use. The following website shows — among others — all upcoming events of Milky Chance: My web service translates the data held by Songkick into the data structure that Google requires in order to make this concert data appear in their Knowledge Graph. This is the corresponding service URL that needs to be called to retrieve the data:

That URL is made of the base URL of the web service, the songkick ID of the artist (6395144 in this case), the artist name and the artist website URL. Try accessing named service URL in your browser. It currently yields this:

    "@context": "", 
    "@type": "MusicEvent", 
    "name": "Milky Chance", 
    "startDate": "2014-12-12", 
    "url": "", 
    "location": {
      "address": {
        "addressLocality": "Kiel", 
        "postalCode": "24116", 
        "streetAddress": "Eichhofstra\u00dfe 1", 
[ ... SNIP ~ 1000 lines of data ... ]
    "performer": {
      "sameAs": "", 
      "@type": "MusicGroup", 
      "name": "Milky Chance"

This piece of data needs to be included in the HTML source code of the artist website. Google then automatically finds this data and eventually displays the concert data in the Knowledge Graph (within a couple of days). That’s it — pretty simple, right? The good thing is that this method does not require layout changes to your website. This data can literally be included in any website, right now.

That is what happened in case of Milky Chance: some time ago, the data created by the web service was fed into the Milky Chance website. Consequently, their concert data is displayed in their Knowledge Graph. See for yourself: access and look out for upcoming events on the right hand side. Screenshot:


Google Knowledge Graph generated for Milky Chance. Note the upcoming events section: for this to appear, Google needs to find the event data in a special markup within the artist’s website.

So, in summary, when would you want to use this web service?

  • You have an interest in presenting the concert data of an artist in Google’s Knowledge Graph (you are record label or otherwise interested in improved marketing and user experience).
  • You have access to the artist website or know someone who has access.
  • The artist concert data already is present on Songkick or will be present in the future.

Then all you need is a specialized service URL, which you can generate with a small form I have prepared for you here:

Background: why Songkick?

Of course, the event data shown in the Knowledge Graph should be up to date and in sync with presentations of the same data in other places (bands usually display their concert data in many places: on Facebook, on their website, within third-party services, …). Fortunately, a lot of bands actually do manage this data in a central place (any other solution would be tedious). This central place/platform/service often is Songkick, because Songkick really made a nice job in providing people with what they need. My web service reflects recent changes made within Songkick.

Technical detail

The core of the web service is a piece of software that translates the data provided by Songkick into the JSON-LD data as required and specified by Google. The Songkick data is retrieved via Songkick’s JSON API (I applied for and got a Songkick API key). Large parts of this software deal with the unfortunate business of data format translation while handling certain edge cases.

The service is implemented in Python and hosted on Google App Engine. Its architecture is quite well thought-through (for instance, it uses memcache and asynchronous urlfetch wherever possible). It is ready to scale, so to say. Some technical highlights:

  • The web service enforces transport encryption (HTTPS).
  • Songkick back-end is queried via HTTPS only.
  • Songkick back-end is queried concurrently whenever possible.
  • Songkick responses are cached for several hours in order to reduce load on their service.
  • Responses of this web service are cached for several hours. These are served within milliseconds.

This is an overview of the data flow:

  1. Incoming request, specifying Songkick artist ID, artist name, and artist website.
  2. Using the Songkick API (SKA), all upcoming events are queried for this artist (one or more SKA requests, depending on number of events).
  3. For each event, the venue ID is extracted, if possible.
  4. All venues are queried for further details (this implicates as many SKA requests as venue IDs extracted).
  5. A JSON-LD representation of an event is constructed from a combination of
    • event data
    • venue data
    • user-given data (artist name and artist website)
  6. All event representations are combined and a returned.

Some notable points in this context:

  • A single request to this web service might implicate many requests to the Songkick API. This is why SKA responses are aggressively cached:
    • An example artist with 54 upcoming events requires 2 upcoming events API requests (two pages, cannot be requested concurrently) and requires roundabout 50 venue API requests (can be requested concurrently). Summed up, this implicates that my web service cannot respond earlier than three SKA round trip times take.
    • If none of the SKA responses has been cached before, the retrieval of about 2 + 50 SKA responses might easily take about 2 seconds.
    • This web services cannot be faster than SK delivers.
  • This web service applies graceful degradation when extracting data from Songkick (many special cases are handled, which is especially relevant for the venue address).

Generate your service URL

This blog post is just an introduction, and sheds some light on the implementation and decision-making. For general reference, I have prepared this document to get you started:

It contains a web form where you can enter the (currently) three input parameters required for using the service. It returns a service URL for you. This URL points to my application hosted on Google App Engine. Using this URL, the service returns the JSON data that is to be included in an artist’s website. That’s all, it’s really pretty simple.

So, please go ahead and use this tool. I’d love to retrieve some feedback. Closely look at the data it returns, and keep your eyes open for subtle bugs. If you see something weird, report it, please. I am very open for suggestions, and also interested in your questions regarding future plans, release cycle etc. Also, if you need support for (dynamically) including this kind of data in your artist’s website, feel free to contact me.

gipc 0.5.0 released

I just released gipc 0.5.0. It contains a backwards-incompatible change: the SIGPIPE signal action is not automatically reset to the default action anymore in child processes. This will hopefully satisfy developers expecting the SIGPIPE signal to be ignored, resulting in a proper Python exception when attempting to write to a pipe that has been closed on the read end (if interested, you can follow the related discussion here). Furthermore, this release improves the performance when sending large messages through gipc pipes on Linux. This release also introduces workarounds for two corner case issues present in Mac OS X and FreeBSD.

This is the full changelog:

  • Improve large message throughput on Linux (see issue #13).
  • Work around read(2) system call flaw on Mac OS X (see issue #13).
  • Work around signal.NSIG-related problem on FreeBSD (see issue #10).
  • Do not alter SIGPIPE action during child bootstrap (breaking change, see issue #12).

Thanks to Dustin Oprea, bra, John Ricklefs, Heungsub Lee, Miguel Turner, and Alex Besogonov for contributing. As usual, the release is available via PyPI ( The documentation and further detail are available at

Repopulate a Minecraft world from the command line


TerrainPopulated: 1 or not present (true/false) indicate whether the terrain in this chunk was populated with special things. (Ores, special blocks, trees, dungeons, flowers, waterfalls, etc.) If set to zero then Minecraft will regenerate these features in the blocks that already exist.

Sometimes people want to “repopulate” their worlds, i.e. set the TerrainPopulated property to False for all chunks of their world, in order to regenerate the special things mentioned above. MCEdit, a Minecraft world editor with a graphical user interface can do this. However, there is a more straight-forward solution to this task, especially if you are running a remote Minecraft server on a headless machine.

MCEdit is backed by pymclevel, a Python library for reading and modifying Minecraft worlds. It has been created by David Rio Vierra (kudos!) and is in development since 2010. Its documentation is rather weak, but its API is not too difficult to understand. I could rather quickly come up with a small Python application that reads a world, iterates through all chunks, resets the TerrainPopulated property for all of them, and saves the world back to disk. This is the code, which I just tested for our own world, but it should very well work for yours, too:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
Copyright 2014 Jan-Philip Gehrcke (
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
Re-populate a minecraft world. That is, set the TerrainPopulated property
to False for all chunks in a world.
TerrainPopulated: 1 or not present (true/false) indicate whether the terrain
in this chunk was populated with special things. (Ores, special blocks, trees,
dungeons, flowers, waterfalls, etc.) If set to zero then Minecraft will
regenerate these features in the blocks that already exist.
Based on the notes in
and on the repop method used in
Usage: path/to/world/directory
A world directory is a single directory containing at least one
file named level.dat.
import os
import sys
import time
import logging
    format='%(asctime)s,%(msecs)-6.1f - %(module)s:%(levelname)s: %(message)s',
log = logging.getLogger()
install_note = """
$ virtualenv pyvenv
$ source pyvenv/bin/activate
$ pip install cython numpy pyyaml
$ pip install git+git://
    from pymclevel import mclevel
except ImportError:
    sys.exit("Cannot import pymclevel. Consider setting it up via %s" %
usage = "%s path/to/world/directory" % os.path.basename(sys.argv[0])
if not len(sys.argv) == 2:
    sys.exit("One argument is required. Usage: %s" % usage)
world_directory = sys.argv[1]
if not os.path.isdir(world_directory):
    sys.exit("Not a directory: %s" % world_directory)"Attempting to read world. This scans the directory "
    "for chunks. Might take a while."))
world = mclevel.fromFile(world_directory)"Get chunk positions iterator.")
chunk_positions = world.allChunks"Iterate through chunks, set TerrainPopulated=0 for all of them.")
t0 = time.time()
for idx, (x, z) in enumerate(chunk_positions):
    if (idx + 1) % 1000 == 0:"Processed %s chunks." % (idx + 1))
    # Retrieve an AnvilChunk object. This object will load and
    # decompress the chunk as needed, and remember whether it
    # needs to be saved or relighted.
    chunk = world.getChunk(x, z)
    chunk.TerrainPopulated = False
    # The above sets `chunk.dirty` which is processed during
    # `saveInPlace()` below. Leads to `saveChunk(cx, cz, data)`.
duration = time.time() - t0
chunks_per_sec = (idx + 1) / duration"Total number of modified chunks: %s." % (idx+1))"Duration: %.2f s. Chunks per second: %.2f." % (
    duration, chunks_per_sec))
# Save the level.dat and any chunks that have been marked for
# writing to disk. This also compresses any chunks marked for
# recompression."Save modified world to disk (might take a moment).")

I called it, and this is how it executes:

$ python schaumwelt
23:55:08,262.2  - materials:INFO: Loading block info from <open file 'pymclevel/minecraft.yaml', mode 'r' at 0x21546f0>
23:55:09,22.4   - materials:INFO: Loading block info from <open file 'pymclevel/classic.yaml', mode 'r' at 0x21546f0>
23:55:09,118.4  - materials:INFO: Loading block info from <open file 'pymclevel/indev.yaml', mode 'r' at 0x2154660>
23:55:09,233.3  - materials:INFO: Loading block info from <open file 'pymclevel/pocket.yaml', mode 'r' at 0x21546f0>
23:55:09,628.0  - repop:INFO: Attempting to read world. This scans the directory for chunks. Might take a while.
23:55:09,628.2  - mclevel:INFO: Identifying schaumwelt
23:55:09,628.5  - mclevel:INFO: Detected Infdev level.dat
23:55:09,728.0  - infiniteworld:INFO: Found dimension DIM1
23:55:09,728.5  - infiniteworld:INFO: Found dimension DIM-1
23:55:09,728.9  - infiniteworld:INFO: Found dimension DIM-17
23:55:09,729.2  - repop:INFO: Get chunk positions iterator.
23:55:09,729.3  - infiniteworld:INFO: Scanning for regions...
23:55:09,745.4  - regionfile:INFO: Found region file r.0.4.mca with 731/807 sectors used and 598 chunks present
23:55:10,672.4  - regionfile:INFO: Found region file r.-2.-1.mca with 1324/1418 sectors used and 1024 chunks present
23:55:10,674.2  - repop:INFO: Iterate through chunks, set TerrainPopulated=0 for all of them.
23:55:12,70.1   - regionfile:INFO: Found region file r.-2.1.mca with 2/2 sectors used and 0 chunks present
23:55:16,51.1   - regionfile:INFO: Found region file r.-2.3.mca with 2/2 sectors used and 0 chunks present
23:55:16,559.9  - regionfile:INFO: Found region file r.3.3.mca with 2/2 sectors used and 0 chunks present
23:55:18,69.6   - repop:INFO: Processed 1000 chunks.
23:55:18,958.6  - regionfile:INFO: Found region file r.-4.2.mca with 2/2 sectors used and 0 chunks present
23:55:21,334.1  - regionfile:INFO: Found region file r.2.-3.mca with 2/2 sectors used and 0 chunks present
23:55:26,524.2  - repop:INFO: Processed 2000 chunks.
23:55:35,573.8  - repop:INFO: Processed 3000 chunks.
23:55:44,324.8  - repop:INFO: Processed 4000 chunks.
00:00:18,820.3  - repop:INFO: Processed 35000 chunks.
00:00:28,940.9  - repop:INFO: Processed 36000 chunks.
00:00:38,30.5   - repop:INFO: Processed 37000 chunks.
00:00:41,787.5  - repop:INFO: Total number of modified chunks: 37426.
00:00:41,788.0  - repop:INFO: Duration: 331.11 s. Chunks per second: 113.03.
00:00:41,788.2  - repop:INFO: Save modified world to disk (might take a moment).
00:00:41,815.3  - infiniteworld:INFO: Saved 0 chunks (dim 1)
00:00:41,842.6  - infiniteworld:INFO: Saved 0 chunks (dim -1)
00:00:41,870.5  - infiniteworld:INFO: Saved 0 chunks (dim -17)
00:00:43,857.3  - regionfile:INFO: Found region file r.0.4.mca with 720/720 sectors used and 589 chunks present
[...]00:00:44,723.8  - regionfile:INFO: Found region file r.-2.-1.mca with 1310/1310 sectors used and 1014 chunks present
00:01:13,807.5  - infiniteworld:INFO: Saved 37426 chunks (dim 0)
00:01:13,808.0  - repop:INFO: Exiting.

Hope that helps.

Discourse Docker container: send mail through Exim


The Discourse deployment was greatly simplified by introducing Docker support (as I have written about before). Discourse heavily depends on e-mail, and its ability to send mail to arbitrary recipients is essential. While the recommended way is to use an external service like Mandrill, it is also possible to use a local MTA, such as Exim. However, when you set up the vanilla Discourse Docker container, it does not contain an pre-configured MTA, which is fine, since many have a well-configured MTA running on the host already. The question is how to use that MTA for letting Discourse send mail.

Usually, MTAs on smaller machines are configured to listen on localhost only, to not be exposed to the Internet and to not be mis-used for spam. localhost on the host itself, however, is different from localhost within a Docker container. The network within the container is a virtual one, and it is cleanly separated from the host. That is, when Discourse running in a container tries to reach an SMTP server on localhost, it cannot reach an MTA listening on localhost outside of the container. There is a straight-forward solution: Docker comes along with a network bridge. In fact, it provides a private network (in the 172.17.x.x range) that connects single containers with the host. This network can be used for establishing connectivity between a network application within a Docker container and the host.

Exim’s network configuration

Likewise, I have set up Exim4 on the Debian host for relaying mails that are incoming from localhost or from the local virtual Docker network. First I looked up the IP address of the docker bridge on the host, being in my case (got that from /sbin/ifconfig). I then instructed Exim to treat this as local interface and listen on it. Also, Exim was explicitly told to relay mail incoming from the subnet, otherwise it would reject incoming mails from that network. These are the relevant keys in /etc/exim4/update-exim4.conf.conf:


The config update is in place after calling update-exim4.conf and restarting Exim via service exim4 restart.

Testing SMTP access from within container

I tested if Exim’s SMTP server can be reached from within the container. I used the bare-bones SMTP implementation of Python’s smtplib for that. First of all, I SSHd into the container by calling launcher ssh app. I then called python. The following Python session demonstrates how I attempted to establish an SMTP connection right to the host via its IP address in Docker’s private network:

>>> import smtplib
>>> server = smtplib.SMTP('')
>>> server.set_debuglevel(1)
>>> server.sendmail("", "", "test")
send: 'ehlo []\r\n'
reply: '250-localhost Hello [] []\r\n'
reply: '250-SIZE 52428800\r\n'
reply: '250-8BITMIME\r\n'
reply: '250-PIPELINING\r\n'
reply: '250 HELP\r\n'
reply: retcode (250); Msg: localhost Hello [] []
SIZE 52428800
send: 'mail FROM:<> size=4\r\n'
reply: '250 OK\r\n'
reply: retcode (250); Msg: OK
send: 'rcpt TO:<>\r\n'
reply: '250 Accepted\r\n'
reply: retcode (250); Msg: Accepted
send: 'data\r\n'
reply: '354 Enter message, ending with "." on a line by itself\r\n'
reply: retcode (354); Msg: Enter message, ending with "." on a line by itself
data: (354, 'Enter message, ending with "." on a line by itself')
send: 'test\r\n.\r\n'
reply: '250 OK id=1X9bpF-0000st-Od\r\n'
reply: retcode (250); Msg: OK id=1X9bpF-0000st-Od
data: (250, 'OK id=1X9bpF-0000st-Od')

Indeed, the mail arrived at my Google Mail account. This test shows that the Exim4 server running on the host is reachable via SMTP from within the Discourse Docker instance. Until I got the configuration right, I observed essentially two different classes of errors:

  • socket.error: [Errno 111] Connection refused in case there is no proper network routing or connectivity established.
  • smtplib.SMTPRecipientsRefused: {'': (550, 'relay not permitted')} in case the Exim4 SMTP server is reachable, but rejecting your mail (for this to solve I had to add the dc_relay_nets='' to the config shown above).

Obviously, in order to make Discourse use that SMTP server, it needs to be configured with DISCOURSE_SMTP_ADDRESS being set to the IP address of the host in the Docker network, i.e. in my case.

Hope that helps!