What is the recommended way to concatenate (two or a few) byte strings in Python 3? What is the recommended way to do so if the code should work for both, Python 2 and 3?
In Python 2 we can easily concatenate two or more byte strings using string formatting:
>>> a = "\x61" >>> b = "\x62" >>> "%s%s" % (a, b) 'ab'
It is payback time and repr("%s" % b"a")
semi-intuitively returns '"b\'a\'"'
in Python 3(.3) (and b"%s" % b"a"
throws TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'
). This is the result of Python 3’s strict distinction between text (sequence of unicode code points) and bytes (sequence of raw bytes). Eventually, in Python 3 the concatenation of byte strings via string formatting yields something entirely different from what Python 2 does:
>>> a = b"\x61" >>> b = b"\x62" >>> "%s%s" % (a,b) "b'a'b'b'"
The outcome is text (a sequence of unicode code points) instead of a byte string (a sequence of bytes). In Python terminology, the result of this byte string concatenation is the concatenation of the representations (repr()
/ __repr__()
) of these byte strings.
Concatenating two byte strings
In Python 3, the __add__
operator returns what we want:
>>> a + b b'ab'
__add__
also works for Python 2:
>>> a + b 'ab'
Concatenating many byte strings
The above would be the preferred method if you want to concatenate only two byte strings. In case you have a longer sequence of byte strings that you need to concatenate, the good old join()
will work in both, Python 2.7 and 3.x.
Python 3 output:
>>> b"".join([a, b]) b'ab'
Python 2.7 output:
>>> b"".join([a, b]) 'ab'
In Python 3, the 'b'
separator string prefix is crucial (join a sequence of strings of type bytes with a separator of type bytes). In Python 2.7, the 'b'
prefix is ignored and string literals are byte strings by default anyway. In older versions of Python 2 the 'b'
prefix is a syntax error.
For a largish sequence of byte strings, the join()
-based concatenation clearly is more efficient than the +
-based one.
Leave a Reply