Category Archives: Companies & Products

Allen & Heath Xone:23C — hidden technical detail and quirks

The brand new Allen & Heath Xone:23C has been presented in countless preview videos and smaller reviews, all mentioning the main features of this great device. I have just obtained mine. There are some important details to know about, which are not mentioned in the documentation and in the reviews I have found so far. I want to share these non-obvious technical details with you, particularly regarding the USB sound hardware that is built into the Xone:23C.

The Xone:23C is a true 3+3 channel mixer

The Xone:23 (without the C) is a 2+2 channel mixer, with two main channels, whereas each main channel can be fed from two analogue sources that have an independent gain control. Still, Chris Brackley from DJ TechTools found that the Xone:23 has too few input options with only two true stereo line inputs, because the other two stereo inputs are made for low voltage (phono) input devices and cannot be switched to line level without hacking the device. The Xone:23C adds two USB audio stereo channels to the mix which can be fed via USB, rendering the Xone:23C to effectively be a 3+3 channel mixer. However, the Internet resources available so far and especially the manual do not explain explicitly how to toggle between USB and line/phono sound. The most obvious observation is that there is no switch to toggle.

The circuit diagram at the end of the manual explains the behavior. I have taken a screenshot from the relevant part and labeled a few components:


The USB audio is processed and converted (to an analogue signal) by the USB sound card block shown in the diagram. This block has two stereo outputs (send 1+2 and send 3+4). I have marked these two stereo channels with blue arrows. This is where your USB audio starts its way into the mixer after being converted to an analogue signal.

The important thing now are the summing amplifiers which I have labeled with green circles (what looks like an M or W actually is a capital Greek sigma, a symbol commonly used for summation). The circuit diagram tells us that each main channel mixes the stereo signals from phono, from line, and from USB, in equal parts. Phono and line have their distinct hardware gain controls on the mixer for controlling the mix ratio. Such a control is missing for the USB audio stream. But it is not needed: the volume of the USB audio can easily be controlled digitally, in the source (your computer).

One of the first things I tried when I got the mixer was to attach a line device and USB audio at the same time to the same main channel. Indeed, both audio sources are mixed into each other, and the loudness of the line signal and the USB signal can be set independently. Hence, the Xone:23C is a true 3+3 channel mixer. No need to toggle between line/phono and USB.

Keep the digital master output low enough!

Obviously, I choose to mix externally with the Xone:23C, using the ASIO drivers for transporting the audio signal from within Traktor to the USB sound card in the mixer. For tracks that are mastered quite loudly, the default master output volume of Traktor is too high, already clamping the signal, and going into the reds on the VU-meter on the mixer. Add some EQ effects or some HPF/LPF with resonance, and your signal becomes horribly distorted. I found that with a Traktor master output volume set to somewhere between -5 dB and -10 dB, the Xone:23C meters stay around 0 dB most of the time for normal parts of most tracks I listened to, whereas the signal increases to at most +6 dB for especially loud parts in a song, or when some effects are added.

If you are using any music player for playing audio on the mixer not through ASIO, but through the normal audio driver of your operating system, I found that a master volume of about 60 % to 70 % is sufficiently low enough for not clamping the signal. If this is set to 100 %, as it usually is, you are already in the reds. Bad.

USB audio from the mixer to the computer.

The USB sound card in the Xone:23C provides two output stereo channels (from the computer into the mixer) and to input stereo channels (from the mixer into the computer). The usage of the output channels is obvious: get sound into the mixer. Each of the two input channels plays a special role, this information is rather hidden in the manual. The mixer has an analogue stereo RCA record output, for capturing the main mix into an analogue recording device. USB input channels 1 and 2 are the same, just digitally. Hence, you can easily use your computer to record the master output of the Xone:23C, with no additional hardware and through the same USB cable that is connecting the mixer to your computer anyway. This is great.

The mixer also has an analogue effects unit stereo output. USB input channels 3 and 4 are the same, just digitally. Hence, you can use software for capturing this input (e.g. in Ableton), generate a corresponding effect output, and feed this one back into the FX input of the Xone:23C. The latter, however, requires additional hardware (another sound device that generates an analogue signal), because there is no digital FX input into the mixer.

Recording only works through ASIO so far

There seems to be one caveat with the USB recording function, at least on Windows. The Xone:23C presents a Line-In WDM recording device, for recording the master mix. However, I was not able to access this device with another software simultaneously playing back through ASIO. Playback and recording only seem to work simultaneously through the ASIO interface.

Audacity (and many other popular open source tools) does not support ASIO (ASIO is a proprietary interface and GPL-licensed software must legally not be binary-distributed with ASIO support built-in). On the other hand, Audacity could record through the Xone:23C Line-In WDM device. However, as stated above, this cannot be accessed if e.g. Traktor at the same time feeds the Xone:23C with audio data through ASIO. In other words, Audacity can not be used for recording the master mix through the Xone:23C WDM Line-In device, while having Traktor playback through the Xone:23C ASIO interface. Opening the WDM device in this scenario results in an error, saying that the device cannot be accessed. What works, indeed, is recording via the Xone:23C ASIO driver through e.g. Traktor or other commercial software.

Recording the master mix from within Traktor, however, is not totally straight-forward. One needs to define an external input source for a normal track deck (e.g. deck A). This input source must be the channels 1+2 from the Xone:23C ASIO input. As long as you do not switch deck A to be of type “live deck”, this input effectively is a no-op input (it does not end up in the output again). Now, you can switch to external recording mode, and choose deck A as input source. Don’t worry, deck A still behaves as a normal track deck, it is just mis-used for this workaround.

Issues with playback on one of my platforms

I have tested the Xone:23C’s internal ASIO sound hardware with two laptops. Both have Windows 7 Professional installed. One is 64 bit architecture and operating system, the other is 32 bit architecture and operating system. I have installed the ASIO drivers from here, specifically the 32 bit version for the 32 bit OS/laptop, and the 64 bit version for the 64 bit OS/laptop. On the 64 bit system, the audio chain (playback software -> ASIO driver -> USB audio interface) behaves as expected. On the 32 bit system I have observed infrequent crackling sounds in the output.

The 32 bit system is a fresh and clean Windows operating system install, and the driver is the “Xone:23C Windows 32bit Driver V2.9.65”. I tried different setups, all without success. Important examples that I tried:

  • Foobar audio player to Xone:23C audio WDM device with small and large buffer sizes
  • Traktor 2.6.8 output to Xone:23C ASIO driver, with small and large buffer sizes
  • Traktor 2.6.8 output to ASIO4ALL driver, with small and large buffer sizes

In all cases, the crackling appears and seems to be independent of the buffer size. The crackling is not very prominent, it appears roughly every 10 seconds, and is rather quiet. I tried different USB ports, re-installing the driver, and a couple of other things, but could not get rid of the cracks. The same Xone:23C attached to the 64 bit machine works perfectly. My 32 bit laptop has an Intel P8800 CPU, i.e. it is definitely not too weak, and playback from Foobar right to the WDM device does not require much CPU power at all. It could be a problem with the 32 bit driver (I have submitted a support ticket to A&H), but it could also be a certain quirk of this specific platform, where one of the drivers (e.g. ACPI or USB) is leading to high latencies. I have to further investigate. It would be great if you could report whether you got the Xone:23C USB audio properly working on a 32 bit Windows system.

Mojibake: Beatport’s ID3 text encoding is broken

Mojibake is a name for garbled text, arising from systematic errors along a text encoding-transfer-decoding chain. What does it have to do with Beatport? This:


This is a screenshot from a playlist of the VLC player, showing MP3 meta data. I downloaded the corresponding track from Beatport. Garbage is displayed where the German Umlaut “Ü” should appear. Why is that? Does the player not support the meta data version, or more specifically the meta data encoding used by Beatport MP3s?

After some investigation I found that Beatport provides MP3 files with invalid meta data. The invalid meta data is the result from a tremendously flawed text encoding procedure in the bowels of Beatport, where text is first encoded via UTF-8, the resulting raw binary data then is interpreted as a unicode code point sequence, and subsequently encoded via UTF-8 again. Horrific, and unsurprisingly the outcome is garbage. The invalid title tag shown above can easily be fixed in Python:

>>> from mutagen.id3 import ID3, TIT2
>>> data = ID3("test.mp3")
>>> corrected_title = unicode(data["TIT2"]).encode('raw_unicode_escape').decode("utf-8")
>>> data.add(TIT2(encoding=3, text=corrected_title))

You do not need to understand that code right now. In the following paragraphs I will explain the issue step by step and slowly work towards this solution. The issue is a result of another developer (team?) not taking enough care of character encodings, although in fact this topic is one of the most important topics in modern information technology, and ignorance in this regard has led to tons of bugs in a plethora of software projects. It is time to refer to Joel’s article “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” again, which you may want to read later on, if you did not so far.

Raw data in the ID3 tag

Meta data in MP3s is usually stored within an ID3 meta data container, as explained on Wikipedia and specified on Different versions of this container format specification are available. First of all, let us find out which ID3 tag version the MP3 files from Beatport use. I have renamed the Beatport MP3 file in question to test.mp3. The following snippet shows the first five bytes of the file:

$ hexdump -C -n 5 test.mp3
00000000  49 44 33 04 00                                    |ID3..|

Quote from here: The first three bytes of the tag are always “ID3”, to indicate that this is an ID3v2 tag, directly followed by the two version bytes. The first byte of ID3v2 version is its major version, while the second byte is its revision number. Hence, this MP3 file contains an ID3 tag in version 2.4.0.

The ID3 data is comprised of frames. For example, the so-called TIT2 frame is designed to contain the track title. I have used hexdump to look for that frame within the first kilobytes of the MP3 file (the ID3 tag may also contain image data, so the size of the entire ID3v2 container can be several kilobytes). The following partial dump shows all the bytes belonging to the TIT2 frame in this file, as well as some stuff before and behind that.

00004900  49 54 31 00 00 00 08 00  00 03 4b 6f 6d 70 61 6b  |IT1.......Kompak|
00004910  74 54 49 54 32 00 00 00  1d 00 00 03 c3 83 c2 9c  |tTIT2...........|
00004920  62 65 72 73 70 72 75 6e  67 20 28 4f 72 69 67 69  |bersprung (Origi|
00004930  6e 61 6c 20 4d 69 78 29  54 4b 45 59 00 00 00 05  |nal Mix)TKEY....|

Text encoding in ID3 v2.4.0

It is clear that the above dump contains the track title in encoded form (there always is some kind of text encoding, there is no such thing as plain text, this should not surprise you). What is the exact format of the piece of data shown above? Which character encodings does the ID3 v2.4.0 specification allow for? Is the encoding itself specified in the file? Let’s have a look at the specification, these are relevant parts:

   All ID3v2 frames consists of one frame header followed by one or more
   fields containing the actual information. The header is always 10
   bytes and laid out as follows:
     Frame ID      $xx xx xx xx  (four characters)
     Size      4 * %0xxxxxxx
     Flags         $xx xx
The frame ID is followed by a size descriptor containing the size of
   the data in the final frame, after encryption, compression and
   unsynchronisation. The size is excluding the frame header ('total
   frame size' - 10 bytes) and stored as a 32 bit synchsafe integer.
   In the frame header the size descriptor is followed by two flag
   bytes. These flags are described in section 4.1.

What follows is the isolated frame data, i.e. all raw bytes belonging to the TIT2 frame (nothing else prepended or appended):

   54 49 54 32 00 00 00  1d 00 00 03 c3 83 c2 9c  |TIT2...........|
62 65 72 73 70 72 75 6e  67 20 28 4f 72 69 67 69  |bersprung (Origi|
6e 61 6c 20 4d 69 78 29                           |nal Mix)|
  • Frame ID: 54 49 54 32. This is the TIT2 label, indicating that this is the frame containing information about the track title.
  • Size: 00 00 00 1d. This is 29 (Python: int("0x1d", 0)). You can count for yourself, there are 39 bytes shown in the dump above, and the ID3 specification says that the frame size is the total frame size minus 10 bytes, so that fits.
  • Flags: 00 00. No flags.

What about text encoding? This is specified in section 4.2 of

All the text information frames have the following format:
     <Header for 'Text information frame', ID: "T000" - "TZZZ",
     excluding "TXXX" described in 4.2.6.>
     Text encoding                $xx
     Information                  <text string(s) according to encoding> informs us about possible encodings:

Frames that allow different types of text encoding contains a text
   encoding description byte. Possible encodings:
     $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
     $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
           strings in the same frame SHALL have the same byteorder.
           Terminated with $00 00.
     $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
           Terminated with $00 00.
     $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

In the raw data above, after frame type, size and flags we see a 03 byte. According to the specification above, this byte means that the following text is encoded in UTF-8. Hence, the file itself tells us that it contains the title tag encoded in UTF-8.

What follows is the byte representation of the title text, extracted from the dump shown above (frame header and text encoding marker removed). It is important to note that the following byte sequence has been created by Beatport (bytes shown hex representation, as before):

c3 83 c2 9c 62 65 72 73 70 72 75 6e 67 20
28 4f 72 69 67 69 6e 61 6c 20 4d 69 78 29

Now, just decode this raw byte sequence using the UTF-8 codec and we have our title, right? Let’s see.

Decoding the raw title data: something is wrong.

Using the \x prefix, we can easily get the raw data just shown (which should encode the title text) into a Python (2) byte string:

>>> raw = "\xc3\x83\xc2\x9c\x62\x65\x72\x73\x70\x72\x75\x6e\x67\x20\x28\x4f\x72\x69\x67\x69\x6e\x61\x6c\x20\x4d\x69\x78\x29"

The ID3 tag itself makes us believe that the original text has been encoded using UTF-8, so in order to retrieve the original text, this operation needs to be inverted. This is easily done in Python, by calling the decode() method on a byte string, providing the codec to be used:

>>> raw.decode("utf-8")
u'\xc3\x9cbersprung (Original Mix)'

The data type returned by this operation is a unicode string, i.e. a sequence of characters, not bytes. And this sequence of characters looks flawed. What is that \xc3\x9c thing there, actually? Does it make sense? To be clarified in the next section.

Reverse-engineering the issue

First, let us verify what happened here. We decoded a raw byte sequence via UTF-8 and retrieved two weird unicode code points in the output. This is the inverse process, starting from the two unexpected unicode code points C3 and 9C:

>>> u"\xc3\x9c".encode("utf-8")

The Python code above defines a sequence of unicode code points, and then encodes this “text” using UTF-8, yielding the very same byte sequence contained in the Beatport ID3 raw data which we have seen before. Now we know which “text” they encoded in order create the meta data in the file they provide for download. But what is that text? We are still missing the German umlaut Ü here, aren’t we? Let us look at the common character representation of these code points:

>>> print u"\xc3\x9c"

By having a look at we can clarify what the code points C3 and 9C really represent:


The print statement above attempted to display these characters on my terminal. The A with tilde appears as expected, followed by a rectangle (you might or might not see that here), representing a control character.

So now we have identified the actual text that Beatport encoded as UTF-8 and saved in the file as raw byte sequence. The VLC player in the figure at the top is behaving correctly: it decodes this byte sequence using UTF-8 and just displays the resulting characters: the A with the tilde and the control character, which has no glyph, and which is therefore represented with a rectangle.

The question left is: why does Beatport encode invalid text in the first place?

The magic of encoding text multiple times.

When you regularly deal with character encodings you probably have an idea already. I had a suspicion. The correct title text starts with a capital German Umlaut Ü. The unicode codepoint for Ü actually is 00DC. What is the raw byte sequence representation of this code point when using the UTF-8 codec?

>>> u"Ü".encode("utf-8")
>>> u"\xdc".encode("utf-8")

Right. It is c3 9c in hex notation. You have seen that a minute ago. Confused? Above, we learned that code points C3 and 9C were considered part of the original text, which was then encoded to its UTF-8 representation, i.e. the UTF-8 representations of the characters U+00C3 and U+009C ended up in the raw data. Now, we have learned that the two bytes c3 9c actually encode the character U+00DC in UTF-8. Still confused?


The original text was encoded twice, whereas the raw byte string representation after the first encoding was erroneously interpreted as unicode code point sequence.

Reproduction of Beatport’s broken text encoding

Let us reproduce this step by step. First, we encode U+00DC (the German Umlaut Ü) to UTF-8:

>>> u"\xdc".encode("utf-8")

Now it is time to go into detail of defining unicode literals in Python 2: with the u in front of the literal, Python is instructed to parse the characters in the literal as unicode code points. One code point can be given with different methods. The first 256 unicode code points (there are many more!) can be given in hex notation. This is what happens above, the \xdc is the U+00DC code point in hex notation.

The output of the above call to encode() is a raw byte string, where the bytes are shown in hex notation. Now we can go ahead and attach a u in front of the raw byte string. This little prefix fundamentally changes the meaning of this string literal. Now, the hex notation does not describe single raw bytes anymore, it describes unicode code points. The two resulting entities are entirely unrelated:

>>> print '\xc3\x9c'
>>> print u'\xc3\x9c'

The return value of both statements has nothing meaningful in common, by concept. The first is a byte string, implicitly decoded via the UTF-8 codec by my terminal (careful, that is magic!). The second is a sequence of two unicode code points.

This is like saying “hey, give me item number 156 and 195 from that shelve there, and then also give me number 156 and 195 from the other shelve over there”, whereas the shelves contain entirely different things. All these two statements have in common is the way the “numbers” are represented in hex notation.

It does not matter which programming language Beatport is using for creating the ID3 meta data, but somehow they managed to do a very weird thing: after having the text encoded in UTF-8 (technically it could also have been Latin-1, as Thomas pointed out in the comments, but that is not that likely), they

  • re-interprete that binary data (most likely in hex representation) again as unicode code point sequence
  • and re-encode this unicode code sequence again with UTF-8.

With our small example, this is the process:

# Encode the text with UTF-8.
>>> u"Ü".encode("utf-8")
# Take the hex representation of the raw byte sequence and
# re-interpret it as unicode code point sequence. Encode this
# with UTF-8 again.
>>> u'\xc3\x9c'.encode("utf-8")

The latter is exactly the invalid raw byte sequence found in the ID3 meta data of Beatport’s MP3 file. The last step in reproducing the entire encoding-transfer-decoding chain is to do what a MP3 player would do: decode that data using UTF-8 and display the corresponding characters:

>>> print '\xc3\x83\xc2\x9c'.decode("utf-8")

The above is exactly what happens within e.g. VLC player or any other player that properly parses the ID3 tag data.

Indeed, this is Beatport’s fault. Within the entire process of text processing, one needs to be aware of the actual representation of the text. At some point in Beatport’s text processing, a developer assumed text to be a Unicode sequence object, whereas it really was an UTF-8-encoded byte string. The conceptual problem is: never make assumptions about the text representation in your code. Always take control of the data and be 100 % sure about the type of text data you are handling.

Otherwise millions of MP3 downloads will be are erroneous.

A systematic fix based on raw_unicode_escape

The process that lead to the erroneous raw byte sequence is now well-understood. Fortunately, this process does not involve any loss of information. The information is just in bad shape. With the help of some Python magic we can invert that process.

The issue is that the byte sequence \xc3\x9c was interpreted as unicode code point sequence, yielding the raw byte sequence \xc3\x83\xc2\x9c after encoding. The Python codec raw_unicode_escape can invert this (kudos to this SO thread):

>>> u'\xc3\x9c'.encode('raw_unicode_escape')

Couldn’t we just have taken away the u? Yes. It is that simple. Manually. But using .encode('raw_unicode_escape') is the only straight-forward automatic procedure to achieve the same effect: keep the item representation, change the item meaning from unicode code points to raw bytes.

Likewise, the invalid raw byte sequence can be fixed using this technique:

>>> raw = '\xc3\x83\xc2\x9c'
# Decode the byte sequence to a unicode object.
>>> raw.decode("utf-8")
# Encode this unicode object, while keeping the item "numbering".
# This yields the UTF-8-encoded text as it was before Beatport
# corrupted it.
>>> raw.decode("utf-8").encode('raw_unicode_escape')
# Decode that text.
>>> raw.decode("utf-8").encode('raw_unicode_escape').decode("utf-8")

As you remember, the code point U+00DC is the Ü. Great! All mangled together, and printed:

>>> print '\xc3\x83\xc2\x9c'.decode("utf-8").encode('raw_unicode_escape').decode("utf-8")

Yes, that’s it: the Ü is restored from the invalid byte sequence, using the knowledge derived above.

Fix the title in an MP3 file using Mutagen

There is an awesome Python module called Mutagen for handling audio file meta data. First of all, let us use Mutagen for directly and comfortably accessing the title data in our MP3 file:

>>> from mutagen.id3 import ID3
>>> data = ID3("test.mp3")
>>> title = data["TIT2"]
>>> title
TIT2(encoding=3, text=[u'\xc3\x9cbersprung (Original Mix)'])
>>> unicode(title)
u'\xc3\x9cbersprung (Original Mix)'

In the above code, unicode(title) yields the same as raw.decode("utf-8") in the section before. Starting from there, we can apply our systematic fix. Loading a Beatport MP3 file, retrieving the title tag, and generating the proper title text in one line:

>>> print unicode(ID3("test.mp3")["TIT2"]).encode('raw_unicode_escape').decode("utf-8")
Übersprung (Original Mix)

All in all, load an MP3 file, generate the corrected title from the invalid one, and save the corrected title back to the file:

>>> from mutagen.id3 import ID3, TIT2
# Load ID3 meta data from MP3 file.
>>> data = ID3("test.mp3")
# Build corrected title.
>>> corrected_title = unicode(data["TIT2"]).encode('raw_unicode_escape').decode("utf-8")
# Update ID3 data object with corrected title.
>>> data.add(TIT2(encoding=3, text=corrected_title))
# Write updated ID3 date to MP3 file.

After pulling that file into the player, we see that the title issue is fixed:


How to fix all ID3 text frames in all files.

We could now assume that Beatport is doing the same mistake with all ID3 text frames. Actually, I have seen invalid Artist strings. Obviously, the task would then be to iterate through a collection of files, and for each file iterate through all ID3 text frames and fix them as shown above. Since I am not sure about the assumption stated before, I will not show the corresponding code here. I think you will manage to do that in case you have a collection of broken files from Beatport and know at least some Python. If not, it is a good exercise :-). But back up your MP3 files before!

Travis CI finally supports Python 3.4

Python 3.4 was released over a month ago. According to this announcement, Travis CI will finally support Python 3.4 in only a few hours. This has been long awaited by the community, given the many "+1" postings in Travis CI issue 1989 and the countless "Add 3.4 to .travis.yml"-style commit messages referencing this issue.

Many believe that Python 3.4 will be the breakthrough for Python 3 and we can expect it to become quite popular. Although Python 2.7 security and bug fixes have recently been “guaranteed” for up to 2020 by Guido, I got the impression that the dominance of Python 2.7 finally decreases — slowly, but steadily. For developers in the open source community this means that Python 3.4 compatibility is an important target to aim for now (you might even want to ignore all releases up to 3.3).

By the way, Ubuntu has made Python 3.4 the default Python 3 in their recently released 14.04 LTS (which will be supported for 5 years). They even considered to ship it as the default Python which they did not do in the end — their recommendation, however, is

“to best support future versions of Ubuntu you should consider porting your code to Python 3”

So, go ahead, use the great Travis CI and make your code run on both, Python 2.7 and Python 3.4!

FreeNAS buries Perl in favor of Python

I am happy user of FreeNAS (a great open source storage server solution) and sporadically follow its development. A couple of months ago, William Grzybowski committed revision 22ebffb6 to the FreeNAS code repository. He crafted a lovely commit message:

Dear perl,

You’re very brave, you have been fighting against us for a long, long time.
The time has come to tear you apart and bury you very deep.

Rest In Peace

Indeed, the FreeNAS team chose to build their management system on top of CPython (2.7, in this case). A great choice pro development efficiency and pro community efforts, I guess.

Kurze Einmischung zum Thema WotzApp

Ein neuer Stern am Himmel der Milliardenkonzerne. Man sollte WhatsApp nicht so sehr dafür loben, dass es simpel zu bedienen ist und funktioniert. Das können andere auch. Anderes ist wichtig. Ihr erinnert euch vielleicht: bis vor Kurzem konnte man sehr einfach WhatsApp-Nachrichten im Namen anderer Leute verschicken. Ist schwieriger geworden, geht aber noch. Was ist eigentlich sicher an WhatsApp?

  • Der Nutzer ist Produkt.
  • Datenschutz hat geringe Priorität.

Ersteres ist spätestens seit dem 19.02.2014 klar. Zweiter Punkt: zum Beispiel kann man im lokalen WLAN versendete Nachrichten mit relativ einfachen Mitteln mitlesen. Es gibt sicherlich noch viele andere kleinere und größere Datenschutz-Probleme — aber all diese Dinge interessieren nur einen kleinen Bruchteil des gemeinen Volkes.

WhatsApp: keine Standards, keine Sicherheit. Einfach Chat im 90er Style für’s Handy.

Die WhatsApp-Ingenieure haben zu Anfang quick & dirty gearbeitet und die Grundzüge ihrer Architektur nicht an gängigen Standards ausgerichtet. IT-Sicherheit und Datenschutz ohne sich an Standards zu halten? Sowas ist von Vornherein im mathematischen Sinne ill-posed. Die Sorglosigkeit bei der technischen Umsetzung von WhatsApp ist uns schon seit Jahren bewusst. Uns ist doch klar, was WhatsApp im Kern ist: eine ganz simple, total gleichgültige Form des Chats. Sicherheit und Datenschutz völlig egal. Folgendes hatte mich schon vor Jahren beeindruckt: man muss sich nicht bei WhatsApp “einloggen”, es gibt kein (geteiltes) Geheimnis.


Man muss sich gar nicht weiter mit WhatsApp beschäftigen, um große Datenschutz-Skepsis dagegen zu hegen. Da schreiben Leute miteinander ohne vorher ein Geheimnis auszutauschen. Mal ein ganz kleiner kurzer Mini-Krypto-Ausflug, vielleicht erreiche ich ja ne schmale Masse, also nen kleinen Teil der breiten Masse. Ein Geheimnis ist etwas was nur DU kennst. Deine Telefonnummer ist kein Geheimnis. Deine IMEI ist auch kein Geheimnis. Ohne ein Geheimnis kann man

  • sich nicht sicher authentifizieren (eindeutig ausweisen),
  • keine Daten sicher verschlüsseln,
  • die Integrität versendeter Daten nicht gewährleisten.

Im Umkehrschluss heißt das für Kommunikation ohne Geheimnis:

  • Jeder kann (mit mehr oder weniger Aufwand) in deinem Namen Nachrichten versenden.
  • Jeder auf dem Kommunikationsweg zwischen dir und dem Empfänger (WLAN, ISP, …) kann deine Nachrichten lesen.
  • Jeder auf dem Kommunikationsweg zwischen dir und dem Empfänger (WLAN, ISP, …) kann deine Nachrichten verändern.

Was wollen die Leute eigentlich? Sicherheit eher nicht so.

Das reicht schon für den Grundkurs. Aber jetzt mal ehrlich: E-Mail und SMS leiden unter den gleichen Problemen. Und auch ICQ und Skype bieten keinen theoretisch vollständigen Schutz, obwohl man hier ja ein Geheimnis benutzt, die Login-Daten (das ist die falsche Geheimnisform, aber das wollen wir hier jetzt nicht behandeln). Und bei DE-Mail muss man sich aufregen, denn hier wird Sicherheit versprochen, die nicht existiert.

Das alles interessiert kaum jemanden. Und ich glaube das ist der Kern — eine wichtige Einsicht: echte Sicherheit wollen die Leute gar nicht unbedingt. Meistens ist ihnen schlicht egal, ob “jemand” mitlesen kann. Ist die breite Masse da irgendwas zwischen naiv und illusorisch? Vielleicht, ist aber egal. Konzeptionell perfekte IT-Sicherheit und alltägliche menschliche Kommunikation passen nicht so recht zusammen. Die selbe Gleichgültigkeit haben wir doch beim NSA-Skandal gesehen. Wo bleibt der Aufschrei? #aufschrei? #aufschrei3000? Mir ist das Ganze ja auch ein Stück weit egal — schließlich benutzte ich ICQ, versende E-Mails und SMS und bin seit Kurzem auch WhatsApp-Nutzer. Dabei weiß ich bei jeder dieser Techniken genau, wie man hier angreifen kann. Habt ihr etwa noch nie per tcpdump im Router die Nachrichten eurer Mitbewohner mitgeschnitten ;-)?

Aber bitte seid euch doch im Klaren darüber, was hier passiert.

Was meiner Meinung nach wichtig ist: das Bewusstsein darüber, wer da Daten von wem in welcher Größenordnung sammeln kann. Und Bewusstsein darüber, dass man sich unter Umständen verkauft. Schaut mal in diesen offiziellen Blogpost von WhatsApp aus 2012:

Remember, when advertising is involved you the user are the product.

Sie erklären da, dass der WhatsApp-Nutzer nicht das Produkt ist, weil sie keine Werbung verwenden. Dann reden sie von ihrer ach so tollen Architektur und dass die simple Form der Kommunikation ihr Produkt ist, nicht etwa der Nutzer oder seine Daten:

That’s our product and that’s our passion. Your data isn’t even in the picture. We are simply not interested in any of it.

19 Milliarden $ für was genau? Achso, ja klar.

Das da oben klang schon immer schmutzig. Neuerdings erscheinen diese Aussagen aber in besonders reudigem Licht, ich formuliere das mal simpel:

  • 19 Milliarden $ für eine (IT-)Architektur, die viele besser hinbekommen hätten? Nee.
  • 19 Milliarden $ für eine riesige Nutzerzahl? Ja.

Was ist also das Produkt? Die Nutzer, genau, wie immer. Seid euch drüber im Klaren.

Was ich noch sagen will: und BitTorrent Chat

Wenn man wirklich mal sichere Kommunikation braucht, dann muss man wissen, wo man die bekommt. Die Medien haben sich gerade auf Threema eingespielt. Schön für die Schweizer, da klingeln bestimmt gut die Kassen. Soweit ich das sehe, ist das kryptographisch solide gemacht. Man hat sich an anerkannte Standards gehalten. Und arbeitet mit echten Geheimnissen. Die Nutzdaten, also die Nachrichteninhalte, scheinen sicher. Ihr müsst aber wissen, dass auch die Leute von Threema natürlich Metadaten sehen und sammeln können, also wer mit wem wann wie viel und so (eigentlich alles außer was und warum vielleicht :-)). Außerdem ist Threema nicht — wie einige Konkurrenten — kostenlos. Nebenbei bemerkt: ihr müsst kein schlechtes Gewissen haben, wenn ihr eine kostenlose App installiert. Der Flappybird-Mann hatte 50.000 $ tägliche Werbebeteiligung nur durch die Präsenz in den jeweiligen Applikations-Einkaufsläden.

Ich würde eure Aufmerksamkeit gerne auf und BitTorrent Chat lenken. wird seit Längerem von drei Schweden entwickelt. Die machen durch ihre Planungs- und Informationspolitik einen äußerst sympathischen und professionellen Eindruck. Sie stehen kurz vor Release und man twittert ihnen schon zu, dass sie am besten sofort jetzt, blabla, aber man reagiert recht cool:

A car without wheels may be 99% complete but is pretty useless, right?

Also,, merkt euch das mal. Macht einen besseren Eindruck als Threema. Auch BittorrentChat ist sehr vielversprechend — in anderer Art und Weise. Wie immer rund um “Torrent” wird hier ein dezentraler Ansatz verfolgt. Ein selbstregulierendes P2P-Netz. Nur mit so einem Ansatz kann man Anonymität verwirklichen (Ähnlichkeit zu TOR), nur so kann man das effiziente Sammeln von Metadaten verhindern. Auch BittorrentChat ist noch nicht fertig, aber kurz vor Release.