Binary Pattern Matching in Elixir

Welcome back! First, a joke:

Don’t trust atoms

They make up everything

Now, onto today's topic: Binary Pattern Matching in Elixir

The Gist

Pattern matching in Elixir is a superpower you probably know about, but chances are that you haven't encountered its little brother, binary pattern matching yet. Let's fix that.

In essence, Elixir makes it incredibly easy to deconstruct binaries into their more meaningful parts.

To give you an example: Imagine you're using Nerves to read temperature and humidity from a sensor. The sensor spec shows how to interpret the bits:

| bit7 | bit6 | bit5 | bit4 | bit3 | bit2 | bit1 | bit0 |
| ---  | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|            temp           |   0  |      humidity      |

Now, before the Nerves-folks crucify me for creating such a nonsensical memory map, let me clarify that I completely made up this table to exactly fit my story-telling needs. So, hush Frank, Gus, and Lars!

With the spec above in hand, you set out to decode the 1s and 0s you receive from the sensor. The pattern match would look like this:

iex> <<temperature::4, _::1, humidity::3>> = <<0b11110111>>
<<247>>
iex> temperature
15
iex> humidity
7

And that's how easy it is to decode binaries using pattern matching!

Look at how similar the spec and pattern match look like! That's one of the big advantages of using Elixir for these problems. It's super readable.

Let's break down the pattern match into its individual components.

Bitstrings

First, the <<>> syntax defines a bitstring.

A bitstring is a contiguous sequence of bits in memory.

However, when you write <<1>>, Elixir doesn't create a bitstring with only a single bit but it stores the number as one byte (8 bits) by default.

iex> <<0b00000001>> == <<1>>
true
iex> <<0b10000000>> == <<128>>
true

That means you have to watch out which values you store in the bitstring since values that are larger than a single byte will get truncated.

iex> <<1>> == <<257>>
true

The integer 257 in binary is 0b100000001, but those are 9 bits. Elixir bitstrings reserve only 8 bits per value by default. Thus, the left-most bit is ignored and only 0b00000001 is stored which is equal to 1.

To store larger values, you have to specify the bit count with value::size(n) or value::n for short:

iex> <<257::9>>
<<128, 1::size(1)>>
iex> <<value::9>> = <<257::9>>
<<128, 1::size(1)>>
iex> value
257

The ::n notation instructs Elixir to reserve n bits to store the value. This isn't how Elixir displays the bitstring though and that might be confusing.

Storing vs Displaying Bitstrings

Elixir fills the reserved bits from the right to store a value, but splits them into bytes from the left when displaying them.

iex> <<257::10>>
<<64, 1::size(2)>>
iex> <<257::11>>
<<32, 1::size(3)>>
iex> <<257::12>>
<<16, 1::size(4)>>

The first part (64, 32, 16) gets smaller as we reserve more bits because of how Elixir stores versus displays bits. When we reserve 10 bits for 257, Elixir first creates a bitstring of 10 zeros:

# Space added for clarification
00000000 00

Then, Elixir stores the value 257 by filling the bitstring from the right:

Before: 00000000 00
After:  01000000 01

But when it displays the bitstring, Elixir splits the bits into bytes from the left:

# With size: 10
01000000 01
    |    |
<<  64,  1::2 >>

# With size: 11
00100000 001
    |    |
<<  32,  1::3 >>

# With size: 12
00010000 0001
    |    |
<<  16,  1::4 >>

The difference of displaying vs storing a bitstring might be confusing, but in practice the bitstring remains one continuous sequence of bits:

iex> <<0b0100000001::10>> == <<257::10>>
true

Until now, we only looked at how Elixir stores integers in a bitstring, but it supports many more types, so let's talk about them next.

Types

When constructing or decoding a bitstring, Elixir works with "segments". For example, to deconstruct a bitstring into one integer, one float, and two bytes, Elixir matches the first bits against an integer, then a float, then two bytes. If any segment fails to match, it raises a MatchError.

Segments can have one of 9 types:

integer - default type if no type is specified
float - must be 16, 32, or 64 (default) bits
bitstring - can be nested within another bitstring!
bits - alias for bitstring
binary - always 8 bits or 1 byte
bytes - alias for binary
utf8 - default encoding of strings
utf16
utf32

Let's see how to use these types in a binary pattern match.

Integer

If no type is specified, Elixir expects an integer segment. Integers are 8 bits by default but the size can be adjusted.

iex> <<1::integer, 2>> == <<1, 2>>
true
iex> <<value::integer-10>> = <<130::10>>
<<32, 2::size(2)>>
iex> value
130
iex> <<0b10000000::8>> == <<128>>
true
iex> <<0b0010000000::10>> == <<128::10>>
true

Floats

Floats are 64 bits by default, but can be smaller if specified:

iex> <<value::float>> = <<12.5>>
# 8 Integers * 8 bits = 64 bits
<<64, 41, 0, 0, 0, 0, 0, 0>>
# 0 10000000010 1001000...000
# | |           |
# Sign (1 bit. 0 for positive, 1 for negative)
#   |           |
#   Exponent (11 bits with 1023 bias for float64)
#               |
#               Significand (52 bits)
#
# significand = 1*2^-1 + 0*2^-2 + 0*2^-3 + 1*2^-4
#
# = (-1)^sign x (1 + significand) x 2^(exponent - bias)
# = (-1)^0 x (1 + 0.5625) x 2^(1026-1023) = 12.5
iex> value
12.5

You can specify the size of the float with the dash modifier and either the short-hand notation -n or the full notation size(n):

iex> <<value::float-32>> = <<12.5::float-size(32)>>
<<65, 72, 0, 0>>
iex> value
12.5

When working with custom-sized floats, you have to watch out for overflows when the number you want to store is too large for the chosen float size:

# 65_504 is the largest number a float16 can store
iex> <<65_504::float-16>> == <<65_504::float-16>>
true
iex> <<65_504::float-16>> == <<65_505::float-16>>
true

In the second example, we try to store a number larger than the float16 precision limit of 65,504, but Elixir just cuts off the extra and falls back to the limit of 65,504. So, unless you work with billions of floats in your application, better stick to the default float64 which is unlikely to lose precision.

Bitstrings

You can match nested bitstrings like this:

iex> <<1, value::bitstring>> = <<1, <<2, 3>> >>
<<1, 2, 3>>
iex> value
<<2, 3>>

Or you can use the alias bits:

iex> <<1, value::bits>> = <<1, <<2, 3>> >>
<<1, 2, 3>>
iex> value
<<2, 3>>

Binaries

Binaries are a special type of bitstring:

A binary is a bitstring where the number of bits is divisible by 8.

So, every binary is a bitstring, but not every bitstring is a binary. When matching a binary, be careful to assume that the underlying bitstring is divisible by 8 because you will receive a MatchError otherwise.

If you match a binary at the end of the bitstring, you don't need to specify its size:

iex> <<1, 2, text::binary>> = <<1, 2, "foo">>
<<1, 2, 102, 111, 111>>
iex> text
"foo"

But to match a binary anywhere else, you must specify its expected size:

iex> <<text::binary-size(3), 1, 2>> = <<"foo", 1, 2>>
<<102, 111, 111, 1, 2>>
iex> text
"foo"

Binaries convert strings to UTF8 by default, so you need to watch out with exact text matching.

iex> <<"foo">> == <<"Foo">>
false

Since binaries are just bytes, you can also use the alias bytes in the match:

iex> <<text::bytes-size(3), 1, 2>> = <<"foo", 1, 2>>
<<102, 111, 111, 1, 2>>
iex> text
"foo"

iex> <<1, 2, text::bytes>> = <<1, 2, "foo">>
<<1, 2, 102, 111, 111>>

Strings

Strings are binaries (sequences of bytes) and UTF8 encoded by default, meaning each character is one byte.

iex> <<c1::bytes-1, c2::bytes-1, c3::bytes-1>> = <<"bar">>
"bar"
iex> [c1, c2, c3]
["b", "a", "r"]

You can change the encoding to UTF16 or UTF32, which will use 2 or 4 bytes per character:

# With UTF16 encoding
iex> <<char::2-bytes, rest::4-bytes>> = <<"foo"::utf16>>
<<0, 102, 0, 111, 0, 111>>
iex> <<0, "f">> == char
true
iex> <<0, "o", 0, "o">> == rest
true

# And with UTF32 encoding:
iex> <<char::4-bytes, rest::8-bytes>> = <<"bar"::utf32>>
<<0, 0, 0, 98, 0, 0, 0, 97, 0, 0, 0, 114>>
iex> <<0, 0, 0, "b">> == char
true

Bitstrings vs Binaries vs Strings

The hierarchy of strings, binaries, and bitstrings is as follows:

All strings are binaries and all binaries are bitstrings, but not all bitstrings are binaries and not all binaries are strings.
Binaries are bitstrings whose bit count is divisible by 8.
Strings are binaries with UTF8/16/32 encoded characters with one character being either 1, 2, or 4 bytes.

Modifiers

Now, let's talk about something fun. Elixir gives you superpowers for decoding numbers exactly how you need them through match modifiers.

Modifiers tell Elixir which extra steps to take when it converts the raw bits into a number. You can tell it to decode signed or unsigned numbers and specify the little or big endianness of multiple bytes.

For example, here's how to decode a bitstring into a signed integer without having to do the conversion from unsigned to signed integer yourself:

iex> <<int::signed-integer>> = <<-100>>
<<156>>
iex> int
-100

iex> <<int::unsigned-integer>> = <<-100>>
<<156>>
iex> int
156

Elixir always displays the unsigned value (156), but the int variable contains the correct signed value (-100).

⚠️ Small detour ⚠️

In case you're interested, this is how you can calculate the signed integer value of a bitstring by hand:

Take the bitstring:
- +100 => 0b01100100
- -100 => 0b10011100
Check the left-most bit for the sign (positive/negative).
- 0 means positive, 1 means negative
If positive, simply convert to decimal:
- +100 => 0b01100100 => 100
If negative, invert all bits, convert to decimal, add 1, and apply negative sign:
- -100 => 0b10011100 => 0b01100011 => 99 + 1 => 100 => -100
- This is called the two's complement method.

⚠️ Detour end ⚠️

Endianness

In case you're not familiar with Endianness, it describes the position of the most significant byte in a sequence of bytes.

But what does this actually mean? Consider this very realistic example: Imagine your partner is at the supermarket and wants to tell you the price of the grapefruits. For completely normal and not at all made-up reasons, you two can only communicate via morse code. You receive the numbers: 4, 9, 9.

Now, you would assume intuitively that the price per grapefruit is 4.99, right?

But how can you be sure that your partner didn't send the digits in reverse order in which case the price would be 9.94?

Naturally and because you are completely normal people, you discussed beforehand which endianness your morse code communication would have. You agreed to use big endianness which means that you send the most-significant bit first. For the price of grapefruit, this means sending the 4, the largest digit, first. That's why you know for sure that your partner meant 4.99 and not 9.94.

You could also have used little endianness. Then your partner would send the least-significant bit first. So, the sequence would have been 9, 9, 4. To get the price, you would have to reverse the order since our decimal system puts the most-significant bit first, resulting in the correct price of 4.99.

Endianness is important when you convert multiple bytes into a single number. By default, Elixir assumes big endianness.

iex> <<big::big-integer-size(16)>> = <<1, 2>>
<<1, 2>>
iex> big
258

iex> <<little::little-integer-size(16)>> = <<1, 2>>
<<1, 2>>
iex> little
513

You won't see a difference if you convert only a single byte, since endianness applies only to multi-byte ordering not bit ordering.

iex> <<big::big-integer>> = <<1>>
<<1>>
iex> big
1

iex> <<little::little-integer>> = <<1>>
<<1>>
iex> little
1

Elixir also supports the native endianness modifier which uses big or little depending on your VM and host OS. I suggest you always use big or little explicitly unless absolutely necessary.

Modifier Order

One feature of modifiers is that their order doesn't matter. All these notations are the same:

<<x::little-signed-integer-size(16)>>
<<x::integer-signed-little-size(16)>>
<<x::size(16)-signed-integer-little>>
<<x::16-signed-integer-little>>

If you want to enforce a certain modifier order in your codebase, you can add this nifty Credo check which will check that the modifiers are always ordered like this:

<<x::[endian]-[sign]-[type]-[size]>>

Thank you Noah for recommending me this credo check 💜

Dynamic Match Size

The last feature of binary pattern matching I want to share with you is that you can have dynamically sized matches like this:

iex> <<count::integer, text::binary-size(count)>> = <<26, "Hello there General Kenobi">>
<<26, 72, 101, ... >>
iex> text
"Hello there General Kenobi"

Once you start decoding bitstrings in the real world, you will often encounter the case in which a preceding value defines the length of the value to follow. In our example, the count defines the length of the string to follow.

This works with all types, including raw bits:

iex> <<count::4, value::size(count)>> = <<0b01001001>>
"I"
iex> count
4
iex> value
9 # or 0b1001

You can also use regular variables to dynamically define match size:

iex> count = 4
4
iex> <<text::binary-size(count)>> = <<"helo">>
"helo"
iex> text
"helo"

This feature is extremely useful if you don't know the size of a segment up front but receive a "meta" or "header" segment which indicates the dynamic size of the segment to follow.

Configure IEx

Elixir displays bitstrings as UTF8 characters whenever possible, which is useful if you actually work with text, but can get in your way when you work with numbers.

Luckily, you can configure IEx to always print out the "raw" values like this:

iex> <<54, 77>>
"6M"
iex> IEx.configure(inspect: [base: :decimal, charlists: :as_lists, binaries: :as_binaries])
:ok
iex> <<54, 77>>
<<54, 77>>

I learned this trick from Geoffrey Lessel's great talk about working with bits at ElixirConf US 2025. Thank you Geoffrey 💜

Conclusion

And that's it! I hope you enjoyed this article! If you want to support me, you can buy my firewall for Phoenix Phx2Ban or my book or video courses (one and two). Follow me on Bluesky or subscribe to my newsletter below if you want to get notified when I publish the next blog post. Until next time! Cheerio 👋