Monthly Archives: February 2014

Building numpy and scipy with Intel compilers and Intel MKL on a 64 bit machine

At work we make heavy use of the Python/numpy/scipy stack. We have built numpy and scipy against Intel’s MKL, with Intel’s C++ compiler icc and Intel’s Fortran compiler ifort. As far as I know, and also from my own experience, this combination of tools lets you get close-to-optimum performance for numerical simulations on classical hardware, especially for linear algebra calculations. If you get the build right, and write proper code, i.e. use numpy’s data types and numpy’s/scipy’s functions the right way, then — spoken in general terms — no commercial or other open source software package is able to squeeze more performance from your hardware. I just updated the installation to the most recent Python 2 (2.7.6), numpy 1.8.0 and scipy 0.13.3. I built on a 64 bit machine with the Intel suite version 12.1.3 — newer versions of the Intel suite have troubles with respect to resolving library dependencies (which is the topic of this blog post). The documentation on the topic of building numpy and scipy with the Intel suite is sparse, so I describe the procedure I took in this article.

A few notes on performance and version decisions: compiling numpy and scipy against MKL provides a significant single-thread performance boost over classical builds. Furthermore, building against Intel’s OpenMP library can provide extremely efficient multicore performance through automatic threading of normal math operations. These performance boosts are significant. You want this boost, believe me, so it makes a lot of sense to build numpy/scipy with the Intel suite. However, it does not really make sense to build (C)Python with the Intel suite. It is not worth the effort — somewhere I have read that it might even be slower than an optimized GCC build. In any case, Python is not so much about math. It is about reliability and control. We should do math in numpy/scipy and use optimized builds for them — then there simply is no significance in optimizing the Python build itself. Regarding the Intel suite, I am pretty sure that newer versions do not provide large performance improvements compared to the 12.1.3 one. These reasons made me build things the following way:

  • CPython 2.7.6 in classical configure/make/make install fashion, with the system’s GCC.
  • numpy and scipy with Intel suite (MKL, icc, ifort) version 12.1.3.

Prerequisites

I assume that you have built Python as a non-privileged user and installed it to some location in your file system (btw: never just overwrite your system’s Python with a custom build!). Set up the environment for this build, so that

$ python

invokes it. You can validate this via $ python --version and $ command -v python. In the following steps, we will work with the same user used for building and installing Python: numpy and scipy installation files will go right into the directory tree of the custom Python build.

I also assume that you have a working copy of MKL, icc, ifort, and that you have set it up like this:

$ source /path_to_intel_compilers/bin/compilervars.sh intel64

After invoking compilervars.sh, your PATH and LD_LIBRARY_PATH environment variables should contain the directories where all the Intel binaries reside. A simple test (which you also should perform):

$ icc --version
icc (ICC) 12.1.3 20120212

Prepare, build, install, and validate numpy

In the numpy source directory create the file site.cfg with the following content:

[mkl]
library_dirs = /path_to_intel_compilers/mkl/lib/intel64/
include_dirs = /path_to_intel_compilers/mkl/include/
mkl_libs = mkl_rt
lapack_libs =

That is right, no more contents are needed in that file, it can be that simple. This approach is also recommended in http://software.intel.com/en-us/articles/numpy-scipy-with-mkl.

What is left to be done is setting compiler flags. The build process uses compiler abstractions stored in the two files: numpy/distutils/fcompiler/intel.py and numpy/distutils/intelccompiler.py. You might feel the need to treat these files with respect, because they appear to be so important. I am frightened of these files, because they are partly inconsistent in themselves and have a quite bloaty appearance not justified by the very simple purpose I expect them to serve. My point is: do not be afraid, and make some drastic reductions in order to be sure about what happens during the build. In numpy/distutils/fcompiler/intel.py you can safely edit the get_flags* methods of the IntelEM64TFCompilerclass, so that they look like this:

class IntelEM64TFCompiler(IntelFCompiler):
    compiler_type = 'intelem'
    [...]
 
    def get_flags(self):
        return ['-O3 -g -xhost -openmp -fp-model strict -fPIC']
 
    def get_flags_opt(self):
        return []
 
    def get_flags_arch(self):
        return []

See, this compiler class describes itself as intelem and that is the compiler type we are going to use in the build command line. The “IntelEM64TFCompiler” has no meaning at all. We are just editing one of the classes, make sure that it has the compiler flags we want and eventually use this compiler abstraction during build. In the code above, I have made sure that the compiler flags '-O3 -g -xhost -openmp -fp-model strict -fPIC' are used by making the method get_flags return them, and make all other related methods return “nothing”. Additionally (not shown above), I have also edited the possible_executables attribute of the class:

possible_executables = ['ifort']

just to make sure that ifort will be used. I have also removed all compiler classes not needed from that file, just to get a better overview.

With the above’s steps, the Fortran compiler has been set up. Next, edit numpy/distutils/intelccompiler.py for configuring the C compiler. I have deleted tons of stuff from this file. What follows is all the content remaining (you can copy/paste it like this, I think, this should be safe for most purposes):

from __future__ import division, absolute_import, print_function
 
from distutils.unixccompiler import UnixCCompiler
from numpy.distutils.exec_command import find_executable
 
class IntelEM64TCCompiler(UnixCCompiler):
    """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python.
    """
    compiler_type = 'intelem'
    cc_exe = 'icc -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'
    #cc_args = "-fPIC"
    def __init__ (self, verbose=0, dry_run=0, force=0):
        UnixCCompiler.__init__ (self, verbose, dry_run, force)
        compiler = self.cc_exe
        self.set_executables(compiler=compiler,
                             compiler_so=compiler,
                             compiler_cxx=compiler,
                             linker_exe=compiler,
                             linker_so=compiler + ' -shared')

Regarding the compiler flags I trusted the Intel people somewhat and took most of them from http://software.intel.com/en-us/articles/numpy-scipy-with-mkl. I read the docs for all of them, and they seem to make sense. But you should think these through for your environment. It is good to know what they mean.

Then build numpy:

python setup.py build --compiler=intelem | tee build.log

Copy the build to the Python tree (install):

python setup.py install | tee install.log

Validate that the numpy build works:

20:20:57 $ python
Python 2.7.6 (default, Feb 18 2014, 15:09:15) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.8.0'
>>> numpy.test()
Running unit tests for numpy
NumPy version 1.8.0
NumPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy
Python version 2.7.6 (default, Feb 18 2014, 15:09:15) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]
nose version 1.3.0
[... snip ...]
----------------------------------------------------------------------
Ran 4969 tests in 63.975s
 
OK (KNOWNFAIL=5, SKIP=3)
<nose.result.TextTestResult run=4969 errors=0 failures=0>

That looks great. Proceed.

Build, install, and validate scipy

Now that numpy is installed(!), we can go ahead with building scipy. Extract the scipy source, enter the source directory and invoke

python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install | tee build_install.log

See how we here enforce usage of intelem (which is just a label/name) for all components? This is important. The scipy build process uses the same build settings as used by numpy, especially the distutils compiler abstraction stuff (which is why numpy needs to installed before — this is a simple fact that the official docs do not explain well). I built and installed at the same time, on purpose. When doing build and install in separate steps, the install step involves some minor compilation tasks which are then performed using gfortran instead of ifort. At first I thought that this is none of an issue, but when executing scipy.test() I soon got a segmentation fault, due to mixing of compilers. When using the command as above (basically taken from http://software.intel.com/en-us/articles/numpy-scipy-with-mkl), the test result is positive:

Python 2.7.6 (default, Feb 18 2014, 15:09:15) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.test()
Running unit tests for scipy
NumPy version 1.8.0
NumPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy
SciPy version 0.13.3
SciPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy
Python version 2.7.6 (default, Feb 18 2014, 15:09:15) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]
nose version 1.3.0
/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy/lib/utils.py:134: DeprecationWarning: `scipy.lib.blas` is deprecated, use `scipy.linalg.blas` instead!
  warnings.warn(depdoc, DeprecationWarning)
/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy/lib/utils.py:134: DeprecationWarning: `scipy.lib.lapack` is deprecated, use `scipy.linalg.lapack` instead!
  warnings.warn(depdoc, DeprecationWarning)
 
[ ... snip ...]
======================================================================
ERROR: test_fitpack.TestSplder.test_kink
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/tests/test_fitpack.py", line 329, in test_kink
    splder(spl2, 2)  # Should work
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/fitpack.py", line 1186, in splder
    "and is not differentiable %d times") % n)
ValueError: The spline has internal repeated knots and is not differentiable 2 times
 
----------------------------------------------------------------------
Ran 8934 tests in 131.506s
 
FAILED (KNOWNFAIL=115, SKIP=220, errors=1)
<nose.result.TextTestResult run=8934 errors=1 failures=0>

The one failing test is a corner case known to fail in some scenarios (https://github.com/scipy/scipy/issues/2911). I might be able to narrow it down and help resolving named issue.

Congratulations, done. Enjoy the performance.

Python 2 on Windows: how to read command line arguments containing Unicode code points

While in the Unix world UTF-8 is the de-facto standard for terminal input and output encoding, the situation on Windows is a bit more complex. In general, Windows is even a step ahead compared to Unix systems: Unicode code points in command line arguments are supported natively when using cmd.exe or the Powershell. The Win 32 API has corresponding functions for retrieving such strings as native Unicode data types.

Python 2(.7), however, does not make use of these functions. Instead, it tries to read arguments as byte sequences. Characters not included in the 7-bit ASCII range end up as ? in the byte strings in sys.argv.

Another issue might be that by default Python does not use UTF-8 for encoding characters in the stdout stream (for me, the default stdout encoding is the more limited code page cp437).

I don’t want to lose too many words now, there are quite reliable workarounds for both issues. Stdout encoding can be enforced with the PYTHONIOENCODING environment variable. chcp 65001 sets the console code page to an UTF-8-alike encoding, so that special characters can be used as command line arguments in an UTF-8-encoded batch file, such as this test.bat:

@chcp 65001 > nul
@set PYTHONIOENCODING=utf-8
python test.py ☺

This is the Python script test.py for printing information about the retrieved command line arguments:

import sys
sys.argv = win32_unicode_argv()
print repr(sys.argv)
for a in sys.argv:
    print(a.encode(sys.stdout.encoding))

Open a terminal (cmd.exe) and execute

c:\> test.bat > out

Then have a look into the file out in which we just redirected the stdout stream of the Python script (tell your editor/file viewer to decode the file using UTF-8 and use a proper font having special glyphs!):

c:\> python test.py ☺ 
[u'test.py', u'\u263a']
test.py
☺

As you can see, the items in argv are unicode strings. This is the magic performed by the function win32_unicode_argv() which I will show below. When encoding these unicode strings to sys.stdout.encoding (which, in fact, is UTF-8 as of the environment variable PYTHONIOENCODING), the special Unicode code point ☺ becomes properly encoded.

All in all, using chcp 65001 + PYTHONIOENCODING="utf-8" + win32_unicode_argv(), we got a well-behaved information stream from the UTF-8-encoded input file test.bat to the UTF-8-encoded output file out.

This is win32_unicode_argv() which is making use of the ctypes module for using the Win 32 API functions that are provided by Windows for retrieving command line arguments as native Win 32 Unicode strings:

import sys
def win32_unicode_argv():
    # Solution copied from http://stackoverflow.com/a/846931/145400
 
    from ctypes import POINTER, byref, cdll, c_int, windll
    from ctypes.wintypes import LPCWSTR, LPWSTR
 
    GetCommandLineW = cdll.kernel32.GetCommandLineW
    GetCommandLineW.argtypes = []
    GetCommandLineW.restype = LPCWSTR
 
    CommandLineToArgvW = windll.shell32.CommandLineToArgvW
    CommandLineToArgvW.argtypes = [LPCWSTR, POINTER(c_int)]
    CommandLineToArgvW.restype = POINTER(LPWSTR)
 
    cmd = GetCommandLineW()
    argc = c_int(0)
    argv = CommandLineToArgvW(cmd, byref(argc))
    if argc.value > 0:
        # Remove Python executable and commands if present
        start = argc.value - len(sys.argv)
        return [argv[i] for i in
                xrange(start, argc.value)]

Kudos to http://stackoverflow.com/a/846931/145400.

Rule of three

A random but notable finding on codinghorror. There are two “rules of three” in software reuse:

  • It is three times as difficult to build reusable components as single use components, and
  • a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.

That kind of wisdom actually comes from this book.

Also, I want this: Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture :-)

Save single page from PDF file as PNG image file

In the open source world, the best choice for PDF command line foo (and PDF foo in general) is almost always ghostscript. This is a quick way to extract a single page from a PDF file and save it as PNG file with a given (resolution in dpi):

#!/bin/bash
INFILE="$1"
OUTFILE="$2"
PAGE="$3"
RES="$4"
 
gs -dBATCH -dNOPAUSE -sDEVICE=png16m \
    -r$RES \
    -dFirstPage=$PAGE \
    -dLastPage=$PAGE \
    "-sOutputFile=$OUTFILE" \
    "$INFILE"

Example usage:

./pdf_page_to_png.sh input.pdf output_p3.png 3 200

“Apparatus and method for improving the air” …

… is the poised and funny title of US patent 2147435, issued by Ernst Gehrcke in 1935:

US patent 2147435 heading

US patent 2147435 heading

Ernst Gehrcke was an experimental physicist (see Wikipedia entry). Despite his last name, I do not think that we have a lot in common, except for being physicists. Regarding named patent, the main idea sounds hilarious in modern days:

This invention relates to a new apparatus and a method for improving the air, i. e. to produce an artificial climate. It consists in admixing a powdery preparation in a finely divided state with air. When breathing air pretreated in such a manner, a very agreeable and refreshing sensation is caused.

There also is a beautiful drawing:

US patent 2147435 Figures 1 & 2.

US patent 2147435 Figures 1 & 2.

Claim 15 of the patent is a good summary of the entire idea:

An apparatus for improving the air comprising a container, a powdered material having the property of stimulating respiration therein, means for retaining the heavier particles of said powder in said container, an opening for allowing the escape of the lighter particles, means for moving the entire container to agitate the mass of material, whereby said lighter particles rise in and pass out of said container, and means for causing a current of air to flow in proximity to said lighter particles, said flow being outside of said container, to transport the same to an individual to be treated.

By the way, according to this article, in the United States “the text and drawings of a patent are typically not subject to copyright restrictions.”