What is the recommended way to concatenate (two or a few) byte strings in Python 3? What is the recommended way to do so if the code should work for both, Python 2 and 3?
In Python 2 we can easily concatenate two or more byte strings using string formatting:
>>> a = "\x61" >>> b = "\x62" >>> "%s%s" % (a, b) 'ab'
It is payback time and
repr("%s" % b"a") semi-intuitively returns
'"b\'a\'"' in Python 3(.3) (and
b"%s" % b"a" throws
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'). This is the result of Python 3’s strict distinction between text (sequence of unicode code points) and bytes (sequence of raw bytes). Eventually, in Python 3 the concatenation of byte strings via string formatting yields something entirely different from what Python 2 does:
>>> a = b"\x61" >>> b = b"\x62" >>> "%s%s" % (a,b) "b'a'b'b'"
The outcome is text (a sequence of unicode code points) instead of a byte string (a sequence of bytes). In Python terminology, the result of this byte string concatenation is the concatenation of the representations (
__repr__()) of these byte strings.
Concatenating two byte strings
In Python 3, the
__add__ operator returns what we want:
>>> a + b b'ab'
__add__ also works for Python 2:
>>> a + b 'ab'
Concatenating many byte strings
The above would be the preferred method if you want to concatenate only two byte strings. In case you have a longer sequence of byte strings that you need to concatenate, the good old
join() will work in both, Python 2.7 and 3.x.
Python 3 output:
>>> b"".join([a, b]) b'ab'
Python 2.7 output:
>>> b"".join([a, b]) 'ab'
In Python 3, the
'b' separator string prefix is crucial (join a sequence of strings of type bytes with a separator of type bytes). In Python 2.7, the
'b' prefix is ignored and string literals are byte strings by default anyway. In older versions of Python 2 the
'b' prefix is a syntax error.
For a largish sequence of byte strings, the
join()-based concatenation clearly is more efficient than the