Building numpy and scipy with Intel compilers and Intel MKL on a 64 bit machine

At work we make heavy use of the Python/numpy/scipy stack. We have built numpy and scipy against Intel’s MKL, with Intel’s C++ compiler icc and Intel’s Fortran compiler ifort. As far as I know, and also from my own experience, this combination of tools lets you get close-to-optimum performance for numerical simulations on classical hardware, especially for linear algebra calculations. If you get the build right, and write proper code, i.e. use numpy’s data types and numpy’s/scipy’s functions the right way, then — spoken in general terms — no commercial or other open source software package is able to squeeze more performance from your hardware. I just updated the installation to the most recent Python 2 (2.7.6), numpy 1.8.0 and scipy 0.13.3. I built on a 64 bit machine with the Intel suite version 12.1.3 — newer versions of the Intel suite have troubles with respect to resolving library dependencies (which is the topic of this blog post). The documentation on the topic of building numpy and scipy with the Intel suite is sparse, so I describe the procedure I took in this article.

A few notes on performance and version decisions: compiling numpy and scipy against MKL provides a significant single-thread performance boost over classical builds. Furthermore, building against Intel’s OpenMP library can provide extremely efficient multicore performance through automatic threading of normal math operations. These performance boosts are significant. You want this boost, believe me, so it makes a lot of sense to build numpy/scipy with the Intel suite. However, it does not really make sense to build (C)Python with the Intel suite. It is not worth the effort — somewhere I have read that it might even be slower than an optimized GCC build. In any case, Python is not so much about math. It is about reliability and control. We should do math in numpy/scipy and use optimized builds for them — then there simply is no significance in optimizing the Python build itself. Regarding the Intel suite, I am pretty sure that newer versions do not provide large performance improvements compared to the 12.1.3 one. These reasons made me build things the following way:

  • CPython 2.7.6 in classical configure/make/make install fashion, with the system’s GCC.
  • numpy and scipy with Intel suite (MKL, icc, ifort) version 12.1.3.


I assume that you have built Python as a non-privileged user and installed it to some location in your file system (btw: never just overwrite your system’s Python with a custom build!). Set up the environment for this build, so that

$ python

invokes it. You can validate this via $ python --version and $ command -v python. In the following steps, we will work with the same user used for building and installing Python: numpy and scipy installation files will go right into the directory tree of the custom Python build.

I also assume that you have a working copy of MKL, icc, ifort, and that you have set it up like this:

$ source /path_to_intel_compilers/bin/ intel64

After invoking, your PATH and LD_LIBRARY_PATH environment variables should contain the directories where all the Intel binaries reside. A simple test (which you also should perform):

$ icc --version
icc (ICC) 12.1.3 20120212

Prepare, build, install, and validate numpy

In the numpy source directory create the file site.cfg with the following content:

library_dirs = /path_to_intel_compilers/mkl/lib/intel64/
include_dirs = /path_to_intel_compilers/mkl/include/
mkl_libs = mkl_rt
lapack_libs =

That is right, no more contents are needed in that file, it can be that simple. This approach is also recommended in

What is left to be done is setting compiler flags. The build process uses compiler abstractions stored in the two files: numpy/distutils/fcompiler/ and numpy/distutils/ You might feel the need to treat these files with respect, because they appear to be so important. I am frightened of these files, because they are partly inconsistent in themselves and have a quite bloaty appearance not justified by the very simple purpose I expect them to serve. My point is: do not be afraid, and make some drastic reductions in order to be sure about what happens during the build. In numpy/distutils/fcompiler/ you can safely edit the get_flags* methods of the IntelEM64TFCompilerclass, so that they look like this:

class IntelEM64TFCompiler(IntelFCompiler):
    compiler_type = 'intelem'
    def get_flags(self):
        return ['-O3 -g -xhost -openmp -fp-model strict -fPIC']
    def get_flags_opt(self):
        return []
    def get_flags_arch(self):
        return []

See, this compiler class describes itself as intelem and that is the compiler type we are going to use in the build command line. The “IntelEM64TFCompiler” has no meaning at all. We are just editing one of the classes, make sure that it has the compiler flags we want and eventually use this compiler abstraction during build. In the code above, I have made sure that the compiler flags '-O3 -g -xhost -openmp -fp-model strict -fPIC' are used by making the method get_flags return them, and make all other related methods return “nothing”. Additionally (not shown above), I have also edited the possible_executables attribute of the class:

possible_executables = ['ifort']

just to make sure that ifort will be used. I have also removed all compiler classes not needed from that file, just to get a better overview.

With the above’s steps, the Fortran compiler has been set up. Next, edit numpy/distutils/ for configuring the C compiler. I have deleted tons of stuff from this file. What follows is all the content remaining (you can copy/paste it like this, I think, this should be safe for most purposes):

from __future__ import division, absolute_import, print_function
from distutils.unixccompiler import UnixCCompiler
from numpy.distutils.exec_command import find_executable
class IntelEM64TCCompiler(UnixCCompiler):
    """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python.
    compiler_type = 'intelem'
    cc_exe = 'icc -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'
    #cc_args = "-fPIC"
    def __init__ (self, verbose=0, dry_run=0, force=0):
        UnixCCompiler.__init__ (self, verbose, dry_run, force)
        compiler = self.cc_exe
                             linker_so=compiler + ' -shared')

Regarding the compiler flags I trusted the Intel people somewhat and took most of them from I read the docs for all of them, and they seem to make sense. But you should think these through for your environment. It is good to know what they mean.

Then build numpy:

python build --compiler=intelem | tee build.log

Copy the build to the Python tree (install):

python install | tee install.log

Validate that the numpy build works:

20:20:57 $ python
Python 2.7.6 (default, Feb 18 2014, 15:09:15) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
>>> numpy.test()
Running unit tests for numpy
NumPy version 1.8.0
NumPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy
Python version 2.7.6 (default, Feb 18 2014, 15:09:15) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]
nose version 1.3.0
[... snip ...]
Ran 4969 tests in 63.975s
<nose.result.TextTestResult run=4969 errors=0 failures=0>

That looks great. Proceed.

Build, install, and validate scipy

Now that numpy is installed(!), we can go ahead with building scipy. Extract the scipy source, enter the source directory and invoke

python config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install | tee build_install.log

See how we here enforce usage of intelem (which is just a label/name) for all components? This is important. The scipy build process uses the same build settings as used by numpy, especially the distutils compiler abstraction stuff (which is why numpy needs to installed before — this is a simple fact that the official docs do not explain well). I built and installed at the same time, on purpose. When doing build and install in separate steps, the install step involves some minor compilation tasks which are then performed using gfortran instead of ifort. At first I thought that this is none of an issue, but when executing scipy.test() I soon got a segmentation fault, due to mixing of compilers. When using the command as above (basically taken from, the test result is positive:

Python 2.7.6 (default, Feb 18 2014, 15:09:15) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.test()
Running unit tests for scipy
NumPy version 1.8.0
NumPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy
SciPy version 0.13.3
SciPy is installed in /projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy
Python version 2.7.6 (default, Feb 18 2014, 15:09:15) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]
nose version 1.3.0
/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy/lib/ DeprecationWarning: `scipy.lib.blas` is deprecated, use `scipy.linalg.blas` instead!
  warnings.warn(depdoc, DeprecationWarning)
/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/numpy/lib/ DeprecationWarning: `scipy.lib.lapack` is deprecated, use `scipy.linalg.lapack` instead!
  warnings.warn(depdoc, DeprecationWarning)
[ ... snip ...]
ERROR: test_fitpack.TestSplder.test_kink
Traceback (most recent call last):
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/nose/", line 197, in runTest
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/tests/", line 329, in test_kink
    splder(spl2, 2)  # Should work
  File "/projects/bioinfp_apps/Python-2.7.6/lib/python2.7/site-packages/scipy/interpolate/", line 1186, in splder
    "and is not differentiable %d times") % n)
ValueError: The spline has internal repeated knots and is not differentiable 2 times
Ran 8934 tests in 131.506s
FAILED (KNOWNFAIL=115, SKIP=220, errors=1)
<nose.result.TextTestResult run=8934 errors=1 failures=0>

The one failing test is a corner case known to fail in some scenarios ( I might be able to narrow it down and help resolving named issue.

Congratulations, done. Enjoy the performance.

2 Pingbacks/Trackbacks

  • Pingback: numpy built with recent Intel compilers: MKL FATAL ERROR | Jan-Philip Gehrcke()

  • entron

    Very helpful! I just have one minor question. If I manually set the environment variables of Intel suite in a terminal everything works fine. But if I launch ipython qtconsole from the system menu instead of lanuching it from a terminal, the ipython qtconsole will not find the Libs. I tried to put the setting

    environment variables line in .bashrc or .profile but neither works. Do you know how to solve this problem? Thanks in advance.

    • Hello, this is a common issue in various scenarios, and your suspicion is correct, you have just proven that your environment in the terminal differs from that which processes see when started “from the desktop”.

      In short: try to reboot, it should then also work from the system menu.

      Interactive non-login shells, like your terminal, interpret the .bashrc file upon every invocation, i.e. changes in that file take immediate effect when you start a new terminal.

      The desktop session, however, lives in a shell that you cannot restart without shutting down your desktop session. Hence, in order to make environment changes take effect for processes started from the system menu, you are often required to restart the desktop session, or reboot the entire machine.

      Whether or not a reboot really is required in order to make environment changes take effect “for the desktop” likely depends on the Linux distribution and Desktop environment used, but I have often time seen this. In theory, there should be simpler ways, like executing the new .bashrc file in context of the old desktop shell. In practice, I have never dealt with this, but you might want to look into it…

  • Pingback: Optimized R and Python: standard BLAS vs. ATLAS vs. OpenBLAS vs. MKL()

  • This is the best article I’ve read regarding MKL. Thank you for putting in the time to writing this and making this topic less obscure!

  • Jason

    You are a freaking genius. I’ve been tearing my hair out for weeks trying to follow the cryptic directions from Intel’s site. They need to take down their crappy explanations and put a link to this article on there. If I ever see you at a conference, I’m buying you like 30 beers.

    • Love that comment, thanks! Which conferences are you going to? ;)

      • Jason

        Haha. Glad you enjoyed that. Quick question for you…

        I’m using the -O3 flag, which you did also and which is a “more aggressive compiler optimization” according to Intel’s site and I’m getting 3 failed tests on SciPy. I’m assuming the corner case they are referencing is the same corner case that failed for you above.

        None of the 3 fails for me though involves the test above. The three that fail for me are:

        Looking online at other sources, apparently the tests pass fine when the less aggressive -O1 flag is used. I know I can always default to that if necessary, but I was wondering if you had any insight into what might be causing that. Perhaps there is an additional flag I can pass or adjust that would allow me to keep the -O3 flag intact?

        • There is only general advice I can give, because different Scipy sources as well as different compiler versions as well as different hardware environments can produce different outcomes. And, most importantly, different application scenarios have different requirements.

          General advice #1: Aggressive optimization usually is what you want!

          General advice #2: if a certain test fails, it is always worth investigating why and how the test fails. You’ll learn something new, which is always good. Then, you’ll need to ask yourself at least these two questions: i) how bad did the test fail? ii) does your application require this component at all?

          Depending on the outcome of these questions, you might come up with even more questions and finally make a decision.

          In the case I described in the article, the phenomenon was already known and in the process of being fixed, and my application did not need the component. It therefore was easy to decide to use the build.

          • Jason

            Thanks for the advice. After thorough searching, it looks like it is a known bug with the Intel fortran compiler.


            That’s unfortunate, but hey, thanks to your tutorial, I know I’ll be able to build NumPy and SciPy with the -O3 flag if they ever get it sorted out!

            I’m doing general data science work, not developing any apps per se so it’s hard to say if I would ever use those particular features or not, but I do make heavy use of Scikit-learn which depends on NumPy and SciPy so I’ll take the safe road for now and use the -O1 flag.

            Thanks again for the quick replies! I know you wrote this about a year-and-a-half ago, but as you can see from the recent comments by myself and others, this is an evergreen topic considering how important these two packages are to the Python world.

  • Romain Mauriac

    This is a great tutorial. I am trying to build numpy from source inside a conda environnment. It seems to work, only have one problem:
    error: [Errno 13] Permission denied: ‘build/src.linux-x86_64-3.4/numpy/distutils/’
    It seems that the build folder has restricted access. Any idea on how to resolve this?

    • Probably you did not perform the entire build process with a single user only. A simple recursive chmod/chown should resolve your issue. But you really need to debug this individually.

  • Romain Mauriac

    Never mind I just removed the restricted folder and build numpy again and it worked. You are really, REALLY helpfull. Your tutorial is the only one that actually helped me install numpy with MKL. Thank you very much