Monthly Archives: February 2014

Distributing a Python command line application

In this article I show how to create a minimal Python command line application, called ‘bootstrap’. I describe how to set it up for publication on PyPI, after which the user can conveniently install it via pip install bootstrap. The installation immediately makes the ‘bootstrap’ command available to the user — for convenient invocation on Unix as well as on Windows. I show how to make the example application live within a proper package structure and how to make it callable and testable in different convenient ways. On Python 2 and Python 3 (I actually tested this on CPython 2.7 and 3.3).

Update March 25, 2014: Thanks for all the feedback. I have updated the article in many places. The template structure is now using a __main__.py for convenience (see below). I have created a git repository from this template structure. Feel free to clone, fork, and star: python-cmdline-bootstrap on GitHub.

Background

There are many ways to achieve the same thing. In the paragraphs below, I try to give proper advice, including current official recommendations, and schemes well-established in the Python community. One thing you need to know, and probably already realized yourself, is that Python packaging and package distribution can be quite tedious. In the past years, the recommendations for doing things “the right way” have often changed. Finally, however, it looks like we have something definite which I can base my article on.

Besides using the right tools, such as setuptools (instead of distribute) and twine, there is a lot of tension hidden in the details of the file/directory structure, and in the way you organize your application in terms of packages and modules. When do you need absolute or relative imports? What would be a convenient entry point for your application? For which parts do you need a wrapper? What is the right way to test the application without installing it? I do not deeply want to go into all these details in this article, but rather present a working solution. Just be sure that I have consulted official docs and guidelines, and taken a deeper look into how various established applications (such as sphinx, coverage, pep8, pylint) are set up in this regard. I have also consulted several great answers on StackOverflow (e.g. this, this, this, this, and this), and finally implemented things myself (also here).

For this article, I try to break down all this valuable input to a minimal bare bones bootstrap project structure that should get you going. I try to reduce complexity, to avoid confusing constructs, and to not discuss difficulties anymore, from here on. The outcome is a very short and simple thing, really.

File structure

I recommend the following basic structure:

python-cmdline-bootstrap/
├── docs
├── test
├── bootstrap
│   ├── __init__.py
│   ├── __main__.py
│   ├── bootstrap.py
│   └── stuff.py
├── bootstrap-runner.py
├── LICENSE
├── MANIFEST.in
├── README.rst
└── setup.py

I have created a git repository from this structure template: python-cmdline-bootstrap on GitHub. Fell free to clone and fork.

Might look random in parts, but it is not. Clarification:

  • All relevant application code is stored within the bootstrap package (which is the bootstrap/ directory containing the __init__.py file).
  • bootstrap-runner.py is just a simple wrapper script that allows for direct execution of the command line application from the source directory, without the need to ‘install’ the application.
  • bootstrap/__main__.py makes the bootstrap directory executable as a script.
  • bootstrap/bootstrap.py is meant to be the main module of the application. This module contains a function main() which is the entry point of the application.
  • bootstrap/stuff.py is just an example for another module containing application logic, which can be imported from within bootstrap.py
  • README.rst and LICENSE should be clear.
  • MANIFEST.in makes sure that (among others) the LICENSE file is included in source distributions created with setuptools.
  • setup.py contains instructions for setuptools. It is executed when you, the creator, create a distribution file and when the user installs the application. Below, I describe how to configure it in a way so that setuptools creates an executable upon installation.

File contents: bootstrap package

The contents of the files in the bootstrap package, i.e. the application logic. Remember, you can find all this on GitHub.

__init__.py:
This file makes the bootstrap directory a package. In simple cases, it can be left empty. We make use of that and leave it empty.

bootstrap.py:

# -*- coding: utf-8 -*-
 
 
"""bootstrap.bootstrap: provides entry point main()."""
 
 
__version__ = "0.2.0"
 
 
import sys
from .stuff import Stuff
 
 
def main():
    print("Executing bootstrap version %s." % __version__)
    print("List of argument strings: %s" % sys.argv[1:])
    print("Stuff and Boo():\n%s\n%s" % (Stuff, Boo()))
 
 
class Boo(Stuff):
    pass

As stated above, this module contains the function which is the main entry point to our application. We commonly call this function main(). This main() function is not called by importing the module, it is only called when main() is called directly from an external module. This for instance happens when the bootstrap directory is executed as a script — this is magic performed by __main__.py, described below.

Some more things worth discussing in the bootstrap.py module:

  • The module imports from other modules in the package. Therefore it uses relative imports. Implicit relative imports are forbidden in Python 3. from .stuff import Stuff is an explicit relative import, which you should make use of whenever possible.
  • People often define __version__ in __init__.py. Here, we define it in bootstrap.py, because it is simpler to access from within bootstrap.py (;-)) and still accessible from within setup.py (where we also need it).

stuff.py:

# -*- coding: utf-8 -*-
 
 
"""bootstrap.stuff: stuff module within the bootstrap package."""
 
 
class Stuff(object):
    pass

As you can see, the bootstrap.stuff module defines a custom class. Once again, bootstrap.bootstrap contains an explicit relative import for importing this class.

__main__.py:

# -*- coding: utf-8 -*-
 
 
"""bootstrap.__main__: executed when bootstrap directory is called as script."""
 
 
from .bootstrap import main
main()

Certain workflows require the bootstrap directory to be treated as both a package and as the main script, via $ python -m bootstrap invocation. Actually, this calls the __main__.py file if existing (or fails if not). From within this file, we simply import our main entry point function (relative import!) and invoke it.

Executing the application: running the entry point function

You might be tempted to perform a $ python bootstrap.py, which would fail with ValueError: Attempted relative import in non-package. Is something wrong with the file structure or imports? No, is not. The invocation is wrong.

The right thing is to cd into the project’s root directory, and then execute

 $ python -m bootstrap arg1

Output:

Executing bootstrap version 0.2.0.
List of argument strings: ['arg1']
Stuff and Boo():
<class 'bootstrap.stuff.Stuff'>
<bootstrap.bootstrap.Boo object at 0x7f6e975e0b10>

Does this look unusual to you? Well, this is not a 1-file-Python-script anymore. You are designing a package, and Python packages have special behavior. This is normal. The $ python -m package kind of invocation actually is quite established and your package should support it. As you can see in the output above, command line argument support is as expected.

There is a straight-forward way for achieving the “normal” behavior that you are used to. That is what the convenience wrapper bootstrap-runner.py is made for. Its content:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
 
"""Convenience wrapper for running bootstrap directly from source tree."""
 
 
from bootstrap.bootstrap import main
 
 
if __name__ == '__main__':
    main()

Should be self-explanatory, still: it imports the entry point function main from module bootstrap.bootstrap and — if executed by itself as a script — invokes this function. Hence, you can use bootstrap-runner.py as a normal script, i.e. as the command line front end to your application. Set permissions via $ chmod u+x bootstrap-runner.py and execute it:

$ ./bootstrap-runner.py argtest
Executing bootstrap version 0.2.0.
List of argument strings: ['argtest']
Stuff and Boo():
<class 'bootstrap.stuff.Stuff'>
<bootstrap.bootstrap.Boo object at 0x7f5402343b50>

Straight-forward, right? You can now use $ python -m bootstrap or bootstrap-runner.py for testing or production purposes, without the need to install the application.

Preparing setup.py

Code upfront:

# -*- coding: utf-8 -*-
 
 
"""setup.py: setuptools control."""
 
 
import re
from setuptools import setup
 
 
version = re.search(
    '^__version__\s*=\s*"(.*)"',
    open('bootstrap/bootstrap.py').read(),
    re.M
    ).group(1)
 
 
with open("README.rst", "rb") as f:
    long_descr = f.read().decode("utf-8")
 
 
setup(
    name = "cmdline-bootstrap",
    packages = ["bootstrap"],
    entry_points = {
        "console_scripts": ['bootstrap = bootstrap.bootstrap:main']
        },
    version = version,
    description = "Python command line application bare bones template.",
    long_description = long_descr,
    author = "Jan-Philip Gehrcke",
    author_email = "jgehrcke@googlemail.com",
    url = "http://gehrcke.de/2014/02/distributing-a-python-command-line-application",
    )

Some things to discuss:

  • Might appear trivial, but from setuptools import setup is the currently recommended way to go.
  • Your setup.py should not import your package for reading the version number. This fails for the end-user. Instead, always read it directly. In this case, I used regular expressions for extracting it. This is up to you. But never import your own module/package.
  • The setup function has many more useful arguments than shown here. For a serious project read the docs and make proper use of author, classifiers, platform, etc.
  • I have called the project cmdline-bootstrap here instead of just bootstrap, because I do really upload this to PyPI later on (see below). And “bootstrap”, although still free, is just too much of a popular name to use it for something that small.

The essential arguments here are packages and entry_points. packages = ["bootstrap"] tells setuptools that we want to install our bootstrap package to the user’s site-packages directory. The console_scripts item 'bootstrap = bootstrap.bootstrap:main' instructs setuptools to generate a script called bootstrap. This script will invoke bootstrap.bootstrap:main, i.e. the main function of our bootstrap.bootstrap module, our application entry point. This is the same as realized within bootstrap-runner.py — the difference is that setuptools automatically creates a wrapper script in the user’s file system when she/he installs bootstrap via pip install bootstrap. setuptools places this wrapper into a directory that is in the user’s PATH, i.e. it immediately makes the bootstrap command available to the user. This also works on Windows, where a small .exe file is created in something like C:\Python27\Scripts.

Testing the setup

We use virtualenv to reproduce what users see. Once, for CPython 2(.7), once for CPython 3(.3). Create both environments:

$ virtualenv --python=/path/to/python27 venvpy27
...
$ virtualenv --python=/path/to/python33 venvpy33
...

Activate the 2.7 environment, and install the bootstrap application:

$ source venvpy27/bin/activate
$ python setup.py install
running install
running bdist_egg
running egg_info
[...]
Installed /xxx/venvpy27/lib/python2.7/site-packages/cmdline_bootstrap-0.2.0-py2.7.egg
Processing dependencies for cmdline-bootstrap==0.2.0
Finished processing dependencies for cmdline-bootstrap==0.2.0

See if (and where) the command has been created:

$ command -v bootstrap
/xxx/venvpy27/bin/bootstrap

Try it:

$ bootstrap arg
Executing bootstrap version 0.2.0.
List of argument strings: ['arg']
Stuff and Boo():
<class 'bootstrap.stuff.Stuff'>
<bootstrap.bootstrap.Boo object at 0x7f1234d31190>

Great. Repeat the same steps for venvpy33, and validate:

$ command -v bootstrap
/xxx/venvpy33/bin/bootstrap
$ bootstrap argtest
Executing bootstrap version 0.2.0.
List of argument strings: ['argtest']
Stuff and Boo():
<class 'bootstrap.stuff.Stuff'>
<bootstrap.bootstrap.Boo object at 0x7f4cf931a550>

A note on automated tests

In the test/ directory you can set up automated tests for your application. You can always directly import the development version of your modules from e.g. test/test_api.py, if you modify sys.path:

sys.path.insert(0, os.path.abspath('..'))
from bootstrap.stuff import Stuff

If you need to directly test the command line interface of your application, then bootstrap-runner.py is your friend. You can easily invoke it from e.g. test/test_cmdline.py via the subprocess module.

Upload your distribution file to PyPI

Create a source distribution of your project, by default this is a gzipped tarball:

$ python setup.py sdist
$ /bin/ls dist
cmdline-bootstrap-0.2.0.tar.gz

Register your project with PyPI. Then use twine to upload your project (twine is still to be improved!):

$ pip install twine
$ twine upload dist/cmdline-bootstrap-0.2.0.tar.gz 
Uploading distributions to https://pypi.python.org/pypi
Uploading cmdline-bootstrap-0.2.0.tar.gz
Finished

Final test: install from PyPI

Create another virtual environment, activate it, install cmdline-bootstrap from PyPI and execute it:

$ virtualenv --python=/xxx/bin/python3.3 venvpy33test
...
$ source venvpy33test/bin/activate
$ bootstrap
bash: bootstrap: command not found
$ pip install cmdline-bootstrap
Downloading/unpacking cmdline-bootstrap
  Downloading cmdline-bootstrap-0.2.0.tar.gz
  Running setup.py egg_info for package cmdline-bootstrap
 
Installing collected packages: cmdline-bootstrap
  Running setup.py install for cmdline-bootstrap
 
    Installing bootstrap script to /xxx/venvpy33test/bin
Successfully installed cmdline-bootstrap
Cleaning up...
 
$ bootstrap testarg
Executing bootstrap version 0.2.0.
List of argument strings: ['testarg']
Stuff and Boo():
<class 'bootstrap.stuff.Stuff'>
<bootstrap.bootstrap.Boo object at 0x7faf433edb90>

That was it, I hope this is of use to some of you. All code is available on GitHub.

Concatenate byte strings in Python 3

What is the recommended way to concatenate (two or a few) byte strings in Python 3? What is the recommended way to do so if the code should work for both, Python 2 and 3?

In Python 2 we can easily concatenate two or more byte strings using string formatting:

>>> a = "\x61"
>>> b = "\x62"
>>> "%s%s" % (a, b)
'ab'

It is payback time and repr("%s" % b"a") semi-intuitively returns '"b\'a\'"' in Python 3(.3) (and b"%s" % b"a" throws TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'). This is the result of Python 3’s strict distinction between text (sequence of unicode code points) and bytes (sequence of raw bytes). Eventually, in Python 3 the concatenation of byte strings via string formatting yields something entirely different from what Python 2 does:

>>> a = b"\x61"
>>> b = b"\x62"
>>> "%s%s" % (a,b)
"b'a'b'b'"

The outcome is text (a sequence of unicode code points) instead of a byte string (a sequence of bytes). In Python terminology, the result of this byte string concatenation is the concatenation of the representations (repr() / __repr__()) of these byte strings.

Concatenating two byte strings

In Python 3, the __add__ operator returns what we want:

>>> a + b
b'ab'

__add__ also works for Python 2:

>>> a + b
'ab'

Concatenating many byte strings

The above would be the preferred method if you want to concatenate only two byte strings. In case you have a longer sequence of byte strings that you need to concatenate, the good old join() will work in both, Python 2.7 and 3.x.

Python 3 output:

>>> b"".join([a, b])
b'ab'

Python 2.7 output:

>>> b"".join([a, b])
'ab'

In Python 3, the 'b' separator string prefix is crucial (join a sequence of strings of type bytes with a separator of type bytes). In Python 2.7, the 'b' prefix is ignored and string literals are byte strings by default anyway. In older versions of Python 2 the 'b' prefix is a syntax error.

For a largish sequence of byte strings, the join()-based concatenation clearly is more efficient than the +-based one.

Kurze Einmischung zum Thema WotzApp

Ein neuer Stern am Himmel der Milliardenkonzerne. Man sollte WhatsApp nicht so sehr dafür loben, dass es simpel zu bedienen ist und funktioniert. Das können andere auch. Anderes ist wichtig. Ihr erinnert euch vielleicht: bis vor Kurzem konnte man sehr einfach WhatsApp-Nachrichten im Namen anderer Leute verschicken. Ist schwieriger geworden, geht aber noch. Was ist eigentlich sicher an WhatsApp?

  • Der Nutzer ist Produkt.
  • Datenschutz hat geringe Priorität.

Ersteres ist spätestens seit dem 19.02.2014 klar. Zweiter Punkt: zum Beispiel kann man im lokalen WLAN versendete Nachrichten mit relativ einfachen Mitteln mitlesen. Es gibt sicherlich noch viele andere kleinere und größere Datenschutz-Probleme — aber all diese Dinge interessieren nur einen kleinen Bruchteil des gemeinen Volkes.

WhatsApp: keine Standards, keine Sicherheit. Einfach Chat im 90er Style für’s Handy.

Die WhatsApp-Ingenieure haben zu Anfang quick & dirty gearbeitet und die Grundzüge ihrer Architektur nicht an gängigen Standards ausgerichtet. IT-Sicherheit und Datenschutz ohne sich an Standards zu halten? Sowas ist von Vornherein im mathematischen Sinne ill-posed. Die Sorglosigkeit bei der technischen Umsetzung von WhatsApp ist uns schon seit Jahren bewusst. Uns ist doch klar, was WhatsApp im Kern ist: eine ganz simple, total gleichgültige Form des Chats. Sicherheit und Datenschutz völlig egal. Folgendes hatte mich schon vor Jahren beeindruckt: man muss sich nicht bei WhatsApp “einloggen”, es gibt kein (geteiltes) Geheimnis.

Krypto-Grundkurs

Man muss sich gar nicht weiter mit WhatsApp beschäftigen, um große Datenschutz-Skepsis dagegen zu hegen. Da schreiben Leute miteinander ohne vorher ein Geheimnis auszutauschen. Mal ein ganz kleiner kurzer Mini-Krypto-Ausflug, vielleicht erreiche ich ja ne schmale Masse, also nen kleinen Teil der breiten Masse. Ein Geheimnis ist etwas was nur DU kennst. Deine Telefonnummer ist kein Geheimnis. Deine IMEI ist auch kein Geheimnis. Ohne ein Geheimnis kann man

  • sich nicht sicher authentifizieren (eindeutig ausweisen),
  • keine Daten sicher verschlüsseln,
  • die Integrität versendeter Daten nicht gewährleisten.

Im Umkehrschluss heißt das für Kommunikation ohne Geheimnis:

  • Jeder kann (mit mehr oder weniger Aufwand) in deinem Namen Nachrichten versenden.
  • Jeder auf dem Kommunikationsweg zwischen dir und dem Empfänger (WLAN, ISP, …) kann deine Nachrichten lesen.
  • Jeder auf dem Kommunikationsweg zwischen dir und dem Empfänger (WLAN, ISP, …) kann deine Nachrichten verändern.

Was wollen die Leute eigentlich? Sicherheit eher nicht so.

Das reicht schon für den Grundkurs. Aber jetzt mal ehrlich: E-Mail und SMS leiden unter den gleichen Problemen. Und auch ICQ und Skype bieten keinen theoretisch vollständigen Schutz, obwohl man hier ja ein Geheimnis benutzt, die Login-Daten (das ist die falsche Geheimnisform, aber das wollen wir hier jetzt nicht behandeln). Und bei DE-Mail muss man sich aufregen, denn hier wird Sicherheit versprochen, die nicht existiert.

Das alles interessiert kaum jemanden. Und ich glaube das ist der Kern — eine wichtige Einsicht: echte Sicherheit wollen die Leute gar nicht unbedingt. Meistens ist ihnen schlicht egal, ob “jemand” mitlesen kann. Ist die breite Masse da irgendwas zwischen naiv und illusorisch? Vielleicht, ist aber egal. Konzeptionell perfekte IT-Sicherheit und alltägliche menschliche Kommunikation passen nicht so recht zusammen. Die selbe Gleichgültigkeit haben wir doch beim NSA-Skandal gesehen. Wo bleibt der Aufschrei? #aufschrei? #aufschrei3000? Mir ist das Ganze ja auch ein Stück weit egal — schließlich benutzte ich ICQ, versende E-Mails und SMS und bin seit Kurzem auch WhatsApp-Nutzer. Dabei weiß ich bei jeder dieser Techniken genau, wie man hier angreifen kann. Habt ihr etwa noch nie per tcpdump im Router die Nachrichten eurer Mitbewohner mitgeschnitten ;-)?

Aber bitte seid euch doch im Klaren darüber, was hier passiert.

Was meiner Meinung nach wichtig ist: das Bewusstsein darüber, wer da Daten von wem in welcher Größenordnung sammeln kann. Und Bewusstsein darüber, dass man sich unter Umständen verkauft. Schaut mal in diesen offiziellen Blogpost von WhatsApp aus 2012:

Remember, when advertising is involved you the user are the product.

Sie erklären da, dass der WhatsApp-Nutzer nicht das Produkt ist, weil sie keine Werbung verwenden. Dann reden sie von ihrer ach so tollen Architektur und dass die simple Form der Kommunikation ihr Produkt ist, nicht etwa der Nutzer oder seine Daten:

That’s our product and that’s our passion. Your data isn’t even in the picture. We are simply not interested in any of it.

19 Milliarden $ für was genau? Achso, ja klar.

Das da oben klang schon immer schmutzig. Neuerdings erscheinen diese Aussagen aber in besonders reudigem Licht, ich formuliere das mal simpel:

  • 19 Milliarden $ für eine (IT-)Architektur, die viele besser hinbekommen hätten? Nee.
  • 19 Milliarden $ für eine riesige Nutzerzahl? Ja.

Was ist also das Produkt? Die Nutzer, genau, wie immer. Seid euch drüber im Klaren.

Was ich noch sagen will: heml.is und BitTorrent Chat

Wenn man wirklich mal sichere Kommunikation braucht, dann muss man wissen, wo man die bekommt. Die Medien haben sich gerade auf Threema eingespielt. Schön für die Schweizer, da klingeln bestimmt gut die Kassen. Soweit ich das sehe, ist das kryptographisch solide gemacht. Man hat sich an anerkannte Standards gehalten. Und arbeitet mit echten Geheimnissen. Die Nutzdaten, also die Nachrichteninhalte, scheinen sicher. Ihr müsst aber wissen, dass auch die Leute von Threema natürlich Metadaten sehen und sammeln können, also wer mit wem wann wie viel und so (eigentlich alles außer was und warum vielleicht :-)). Außerdem ist Threema nicht — wie einige Konkurrenten — kostenlos. Nebenbei bemerkt: ihr müsst kein schlechtes Gewissen haben, wenn ihr eine kostenlose App installiert. Der Flappybird-Mann hatte 50.000 $ tägliche Werbebeteiligung nur durch die Präsenz in den jeweiligen Applikations-Einkaufsläden.

Ich würde eure Aufmerksamkeit gerne auf heml.is und BitTorrent Chat lenken. https://heml.is/ wird seit Längerem von drei Schweden entwickelt. Die machen durch ihre Planungs- und Informationspolitik einen äußerst sympathischen und professionellen Eindruck. Sie stehen kurz vor Release und man twittert ihnen schon zu, dass sie am besten sofort jetzt, blabla, aber man reagiert recht cool:

A car without wheels may be 99% complete but is pretty useless, right?

Also, https://heml.is/, merkt euch das mal. Macht einen besseren Eindruck als Threema. Auch BittorrentChat ist sehr vielversprechend — in anderer Art und Weise. Wie immer rund um “Torrent” wird hier ein dezentraler Ansatz verfolgt. Ein selbstregulierendes P2P-Netz. Nur mit so einem Ansatz kann man Anonymität verwirklichen (Ähnlichkeit zu TOR), nur so kann man das effiziente Sammeln von Metadaten verhindern. Auch BittorrentChat ist noch nicht fertig, aber kurz vor Release.

Conditionally raising an exception in Python: short-circuit evaluation of “raise”

Probably you have used a shortcut construct like that:

def conditional_result(switch):
    if switch:
        return result_a
    return result_b

Obviously, an else is not required, because the execution flow leaves the function once it comes across a return statement. Have you ever felt doing the same with exceptions? In other words:

def conditional_error(switch):
    if switch:
        raise ErrorA
    raise ErrorB

Does that work the same way? Sure. Sure? After ErrorA is raised — can the execution sequence ever magically re-enter the function conditional_error right below the if block, where it was exceptionally left? Generators can do this (with the yield statement) .

The answer is that the above code is safe and behaves as expected, analogous to the conditional_result function. The explanation from A Programmer’s Introduction to Python:

The raise statement does two things: it creates an exception object, and immediately leaves the expected program execution sequence to search the enclosing try statements for a matching except clause. The effect of a raise statement is to either divert execution in a matching except suite, or to stop the program because no matching except suite was found to handle the exception.

For the conditional_error function above this means: if there is a matching except clause, it is outside of the function (there definitely is no handler within the function, do you see one?). In this case the execution flow steps out of the function without saving the current state (as a generator or a co-routine in general would do) — the function is really, absolutely left and the program proceeds at a different point: the matching except clause. If there is no matching except clause, the program stops.

This is how it looks live:

class ErrorA(Exception):
    pass
 
 
class ErrorB(Exception):
    pass
 
 
def conditional_error(switch):
    if switch:
        raise ErrorA
    raise ErrorB
 
 
try:
    conditional_error(True)
except ErrorA:
    print("error A caught!")
 
 
try:
    conditional_error(False)
except ErrorB:
    print("error B caught!")
 
 
conditional_error(False)

With the following output:

$ python test.py
error A caught!
error B caught!
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    conditional_error(False)
  File "test.py", line 11, in conditional_error
    raise ErrorB
__main__.ErrorB

numpy built with recent Intel compilers: MKL FATAL ERROR

I just spent hours trying to build numpy 1.8.0 with the most recent Intel MKL, Intel C++ Composer, and Intel Fortran Composer (Version 2013 SP1, from January 2014). numpy built fine, imported fine, and basic math worked fine. But one of the earlier tests (when running numpy.test()) triggered this error:

MKL FATAL ERROR: Cannot load libmkl_mc3.so or libmkl_def.so.

This error message is obviously not generated by the system, but written out by some MKL code. I double-checked that LD_LIBRARY_PATH was set correctly, and that these libraries were actually available. MKL itself seems to be confused about how to access these libraries.

There is not too much to find about this error — the two most important resources being

  • An Intel forum thread, where even the Intel engineers are struggling in identifying the problem. The user there solved his issue by accident, there was no logical solution.
  • and a StackOverflow answer saying “Intel people claim this was MKL library bug.

More people have this kind of problem:

The Intel guys have a general “solution” to this class of problem: http://software.intel.com/en-us/articles/mkl-fatal-error-cannot-load-neither-xxxx-with-intel-mkl-10x — this article provides insights into how things should work, but did not really help in my scenario.

Some of the workarounds that magically solve this issue for some people involve explicit declaration of a list of libraries to link to (via mkl_libs in site.cfg in case of numpy), but that did not help in my case. According to http://software.intel.com/en-us/articles/numpy-scipy-with-mkl and other Intel resources, it should be the best idea to rely on mkl_rt for dynamic dependency resolution, so this is what I wanted to do. And most importantly the mkl_rt method should, according to Intel, prevent this error from happening (reference). This was also propagated in a forum thread in 2010 (here):

“To who’m concerns,

The mkl dynamic library issue will be fixed thoroughly in MKL 10.3 and future version. A new librar mkl_rt was introduced. Now to build the custom MKL library based on dynamic MKL, one can easilyuse the library libmkl_rt.so instead of -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

The new mkl version will be integrated to next intel Compiler version too, which targeted to be release in Nov.”

Yeah, that’s ironic. I do not want to say that it is impossible to get things running using recent Intel compiler suites, this statement would be wrong. But it is definitely tedious! See https://github.com/GalSim-developers/GalSim/issues/261. One quote from that thread:

Actually, this might be harder than I thought. I googled the error you were having and found a thread that ends with this comment:

“I remember now that I had the same problem recently – it is a
fundamental incompatibility between MKL and Python way of loading shared
libraries through dlopen. AFAIK, there is no solution to this problem,
except for using the static libraries.”

I’m not sure what they mean about using static libraries though, since python usually can’t handle static libraries.

For sure, the conclusion after this bit of research is that this error message is the result of a serious issue, and the direct solution to it is not well documented. There is a simpler way to tackle this:

I went a few steps back and built with the 12.1.3 suite (MKL, icc, ifort). It just works (and suggests that there are actual issues with newer releases of MKL/icc/ifort).

I describe the build process in another blog post here.