Concatenate byte strings in Python 3

What is the recommended way to concatenate (two or a few) byte strings in Python 3? What is the recommended way to do so if the code should work for both, Python 2 and 3?

In Python 2 we can easily concatenate two or more byte strings using string formatting:

>>> a = "\x61"
>>> b = "\x62"
>>> "%s%s" % (a, b)
'ab'

It is payback time and repr("%s" % b"a") semi-intuitively returns '"b\'a\'"' in Python 3(.3) (and b"%s" % b"a" throws TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'). This is the result of Python 3’s strict distinction between text (sequence of unicode code points) and bytes (sequence of raw bytes). Eventually, in Python 3 the concatenation of byte strings via string formatting yields something entirely different from what Python 2 does:

>>> a = b"\x61"
>>> b = b"\x62"
>>> "%s%s" % (a,b)
"b'a'b'b'"

The outcome is text (a sequence of unicode code points) instead of a byte string (a sequence of bytes). In Python terminology, the result of this byte string concatenation is the concatenation of the representations (repr() / __repr__()) of these byte strings.

Concatenating two byte strings

In Python 3, the __add__ operator returns what we want:

>>> a + b
b'ab'

__add__ also works for Python 2:

>>> a + b
'ab'

Concatenating many byte strings

The above would be the preferred method if you want to concatenate only two byte strings. In case you have a longer sequence of byte strings that you need to concatenate, the good old join() will work in both, Python 2.7 and 3.x.

Python 3 output:

>>> b"".join([a, b])
b'ab'

Python 2.7 output:

>>> b"".join([a, b])
'ab'

In Python 3, the 'b' separator string prefix is crucial (join a sequence of strings of type bytes with a separator of type bytes). In Python 2.7, the 'b' prefix is ignored and string literals are byte strings by default anyway. In older versions of Python 2 the 'b' prefix is a syntax error.

For a largish sequence of byte strings, the join()-based concatenation clearly is more efficient than the +-based one.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? Please fill this out: * Time limit is exhausted. Please reload CAPTCHA.

  1. […] I Ran into an issue with python3. Python3 and python2 treat strings differently, python2 they are a series of bytes, where as python3 there are a series of unicode characters (hence python3’s byte and bytestring classes). […]

  2. blipton Avatar
    blipton

    I’m running into a similar issue with xor..
    “unsupported operand type(s) for ^: ‘bytes’ and ‘bytes’”

    How would this work in python 3.4?

    sequence = b”x01″
    self.putc(b”xFF” ^ sequence)

    1. Jan-Philip Gehrcke Avatar

      Quote: “Bitwise operations only make sense for integers” from https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types

      You can construct integers from bytes using classmethod int.from_bytes(), see https://docs.python.org/3/library/stdtypes.html#int.from_bytes

  3. Carl Winbäck Avatar
    Carl Winbäck

    Thank you for this write-up!