gipc 1.0.0 release

More than three years after the 0.6.0 release I have published gipc version 1.0.0 today.

Quick links

Release highlights

This release focuses on reliability and platform compatibility. It brings along a number of changes relevant for running it on Windows and macOS, as well as for running it under PyPy.

Both, gevent 1.2 and 1.3 are now officially supported. On Linux, gipc now officially supports CPython 2.7, 3.4, 3.5, 3.6, PyPy2.7, and PyPy3. On Windows, gipc officially supports gevent 1.3 on CPython 2.7, 3.4, 3.5, 3.6, and 3.7. Support for gevent 1.1 and CPython 3.3 has been dropped.

The API did not change. In view of the stability of the API over the recent years I thought it is time to officially declare it as such, and to follow the semantic versioning spec‘s point 5: “Version 1.0.0 defines the public API” :-).

For this release most of the work went into

  • fixing a small number of platform-dependent bugs (this one was interesting, and this one was pretty insightful and ugly).
  • setting up a continuous integration (CI) pipeline for Linux and macOS (on Travis CI) as well as for Windows (via AppVeyor).
  • Re-writing and re-styling significant parts of the documentation: the new docs are online and can be found at https://gehrcke.de/gipc (for comparison: old docs).
  • moving the repository from Bitbucket to GitHub (I also migrated issues using this well-engineered helper).
  • making tests more stable.
  • running the example programs as part of CI, on all supported platforms (required a number of consolidations).

Acknowledgements

I would like to thank the following people who have helped with this release, be it by submitting bug reports, by asking great questions, with testing, or with a bit of code: Heungsub Lee, James Addison, Akhil Acharya, Oliver Margetts.

Changelog for this release

For the record, the complete changelog for this release copied from CHANGELOG.rst:

New platform support:

  • Add support for PyPy on Linux. Thanks to Oliver Margetts and to Heungsub Lee for patches.

Fixes:

  • Fix a bug as of which gipc crashed when passing “raw” pipe handles between processes on Windows (see issue #63).
  • Fix can't pickle gevent._semaphore.Semaphore error on Windows.
  • Fix ModuleNotFoundError in test_wsgi_scenario.
  • Fix signal handling in example infinite_send_to_child.py.
  • Work around segmentation fault after fork on Mac OS X (affected test_wsgi_scenario and example program wsgimultiprocessing.py).

Test / continuous integration changes:

  • Fix a rare instability in test_exitcode_previous_to_join.
  • Make test_time_sync more stable.
  • Run the example programs as part of CI (run all on Linux and Mac, run most on Windows).
  • Linux main test matrix (all combinations are covered):
    • gevent dimension: gevent 1.2.x, gevent 1.3.x.
    • Python implementation dimension: CPython 2.7, 3.4, 3.5, 3.6, PyPy2.7, PyPy3.
  • Also test on Linux: CPython 3.7, pyenv-based PyPy3 and PyPy2.7 (all with gevent 1.3.x only).
  • Mac OS X tests (all with gevent 1.3.x):
    • pyenv Python builds: CPython 2.7, 3.6, PyPy3
    • system CPython
  • On Windows, test with gevent 1.3.x and CPython 2.7, 3.4, 3.5, 3.6, 3.7.

Potentially breaking changes:

  • gevent 1.1 is not tested anymore.
  • CPython 3.3 is not tested anymore.

Atomically switch a directory tree served by nginx

This post briefly demonstrates how to atomically switch a directory tree of static files served by nginx.

Consider the following minimal nginx config file:

$ cat conf/nginx.conf 
events {
    use epoll;
}
 
http {
    server {
        listen 0.0.0.0:80;
        location / {
            root /static/current ;
        }
    }
}

The goal is to replace the directory /static/current atomically while nginx is running.

This snippet shows the directory layout that I started out with:

$ tree
.
├── conf
│   └── nginx.conf
└── static
    ├── version-1
    │   └── hello.html
    └── version-2
        └── hello.html

conf/nginx.conf is shown above. The static directory contains two sub trees, and the goal is to switch from version-1 to version-2.

For this demonstration I have started a containerized nginx from its official Docker image:

$ docker run -v $(realpath static):/static:ro -v $(realpath conf):/etc/nginx:ro -p 127.0.0.1:8088:80 nginx nginx -g 'daemon off;'

This mounts the ./static directory as well as the nginx configuration file into the container, and exposes nginx listening on port 8088 on the local network interface of the host machine.

Then, in the ./static directory one can choose the directory tree served by nginx by setting a symbolic link, and one can subsequently switch the directory tree atomically, as follows:

1) No symbolic link is set yet — leading to a 404 HTTP response (the path /static/current does not exist in the container from nginx’ point of view):

$ curl http://localhost:8088/hello.html
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.15.6</center>
</body>
</html>

2) Set the current symlink to serve version-1:

$ cd static
$ ln -s version-1 current && curl http://localhost:8088/hello.html
hello 1

3) Prepare a new symlink for version-2 (but don’t switch yet):

$ ln -s version-2 newcurrent

4) Atomically switch to serving version-2:

$ mv -fT newcurrent current && curl http://localhost:8088/hello.html
hello 2

In step (4) It is essential to use mv -fT which changes the symlink with a rename() system call. ln -sfn would also appear to work, but it uses two system calls under the hood and therefore leaves a brief time window during which opening files can fail because the path is invalid.

Final directory layout including the symlink current (currently pointing to version-2):

$ tree
.
├── conf
│   └── nginx.conf
└── static
    ├── current -> version-2
    ├── version-1
    │   └── hello.html
    └── version-2
        └── hello.html

Kudos to https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html for being a great reference.

How to raise UnicodeDecodeError in Python 3

For debugging work, I needed to manually raise UnicodeDecodeError in CPython 3(.4). Its constructor requires 5 arguments:

>>> raise UnicodeDecodeError()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: function takes exactly 5 arguments (0 given)

The docs specify which properties an already instantiated UnicodeError (the base class) has: https://docs.python.org/3/library/exceptions.html#UnicodeError

These are five properties. It is obvious that the constructor needs these five pieces of information. However, setting them requires knowing the expected order, since the constructor does not take keyword arguments. The signature also is not documented when invoking help(UnicodeDecodeError), which suggests that this interface is implemented in C (which makes sense, as it affects the bowels of CPython’s text processing).

So, the only true reference for finding out the expected order of arguments is the C code implementing the constructor. It is defined by the function UnicodeDecodeError_init in the file Objects/exceptions.c in the CPython repository. The essential part are these lines:

if (!PyArg_ParseTuple(args, "O!OnnO!",
     &PyUnicode_Type, &ude->encoding,
     &ude->object,
     &ude->start,
     &ude->end,
     &PyUnicode_Type, &ude->reason)) {
         ude->encoding = ude->object = ude->reason = NULL;
         return -1;
}

That is, the order is the following:

  1. encoding (unicode object, i.e. type str)
  2. object that was attempted to be decoded (here a bytes object makes sense, whereas in fact the requirement just is that this object must provide the Buffer interface)
  3. start (integer)
  4. end (integer)
  5. reason (type str)

Hence, now we know how to artificially raise a UnicodeDecodeError:

>>> o = b'\x00\x00'
>>> raise UnicodeDecodeError('funnycodec', o, 1, 2, 'This is just a fake reason!')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'funnycodec' codec can't decode byte 0x00 in position 1: This is just a fake reason!

Download article as PDF file from Elsevier’s ScienceDirect via command line (curl)

When not in the office, we often times cannot directly access scientific literature, because access control is usually based on IP addresses. However, we usually have SSH access to the university network. Being logged in to a machine in the university network we should — in theory — be able to access a certain article. Most of the times it is the PDF file that we are interested in and not the “web page” corresponding to an article. So, can’t we just $ curl http://whatever.com/article.pdf to get that file? Most of the times, this does not work, because access to journal articles usually happens through rather complex web sites, such as Elsevier’s ScienceDirect:

ScienceDirect is a leading full-text scientific database offering journal articles and book chapters from nearly 2,500 journals and 26,000 books.

Such web sites add a considerable amount of complexity to the technical task of downloading a file. The problem usually starts with obtaining the direct URL to the PDF file. Also, HTTP redirection and cookies are usually involved. Often times, the only solution people see to solve these issues is to set up a VPN and then to use a fully fledged browser through that VPN, and let the browser deal with the complexity.

However, I prefer to get back to the basics and always strive to somehow find a direct URL to the PDF file to then download it via curl or wget.

This is my solution for Elsevier’s ScienceDirect:

Say, for instance, you wish to download the PDF version of this article: http://www.sciencedirect.com/science/article/pii/S0169433215012131

Then all you need is that URL and the following commands executed on a common Linux system:

export SDURL="http://www.sciencedirect.com/science/article/pii/S0169433215012131"
curl -Lc cookiejar "${SDURL}" | grep pdfurl | perl -pe 's|.* pdfurl=\"(.*?)\".*|\1|' > pdfurl
curl -Lc cookiejar "$(cat pdfurl)" > article.pdf

The method first parses the HTML source code of the main page corresponding to the article and extracts a URL to the PDF file. At the same time, it also stores the HTTP cookie(s) set by the web server when accessing named web page. These cookies are then re-used when accessing the PDF file directly. This has reproducibly worked for me.

If it does not work for you, I recommend having a look into the file pdfurl and see if that part of the process has lead to a meaningful result or not. Obviously, the second step can only succeed aver having obtained a proper URL to the PDF file.

This snippet should not be treated as a black box. Please execute it in an empty directory. Also note that this snippet only works subject to the condition that ScienceDirect keeps functioning the way it does right now (which most likely is the case for the next couple of months or years).

Don’t hesitate to get back to me if you have any questions!

gipc 0.6.0 released

I have just released gipc 0.6.0 introducing support for Python 3. This release has been motivated by gevent 1.1 being just around the corner, which will also introduce Python 3 support.

Changes under the hood

The new gipc version has been verified to work on CPython 2.6.9, 2.7.10, 3.3.6, and 3.4.3. Python 3.4 support required significant changes under the hood: internally, gipc uses multiprocessing.Process, whose implementation drastically changed from Python 3.3 to 3.4. Most notably, on Windows, the arguments to the hard-coded CreateProcess() system call were changed, preventing automatic inheritance of file descriptors. Hence, implementation of a new approach for file descriptor transferral between processes was required for supporting Python 3.4 as well as future Python versions. Reassuringly, the unit test suite required close-to-zero adjustments.

The docs got a fresh look

I have used this opportunity to amend the documentation: it now has a fresh look based on a brutally modified RTD theme. This provides a much better reading experience as well as great support for narrow screens (i.e. mobile support). I hope you like it: https://gehrcke.de/gipc

Who’s using gipc?

I have searched the web a bit for finding interesting use cases. These great projects use gipc:

Are you successfully applying gipc in production? That is always great to hear, so please drop me a line!

Availability

As usual, the release is available via PyPI (https://pypi.python.org/pypi/gipc). Please visit https://gehrcke.de/gipc for finding API documentation, code examples, installation notes, and further in-depth information.