# Introduction To Bit Hacking

Bit hacking is what you do when you read or write individual bits of a larger number. Reading or writing bytes is fairly trivial, but a byte is the smallest chunk that most languages would provide functions for. If you want to access the bits, you'd have to use what are called bitwise operators like `or`, `and`, `xor` `not` or `shift`. Since we are fiddling with numbers, the operator to get what we want is not as straightforward as a function name. And it feels low level, hence the term "bit hacking".

There are many reasons why you would want to do this, but the common idea is that you want to read or write binary information that is not ascii or anything that does not already have a library. We will see examples later in this tutorial. Please bear in mind that for simplicity, all numbers in the examples are considered `uint8` (unsigned integers taking 8 bits).

## The bitwise operators

I suppose a good thing to clarify is the difference between the "logical" operators and their "bitwise" counter part because in high level programming we are more exposed to the former. For example we often have an `if` statement and 2 conditions need to be `true`.

``````if itIsTime and foodIsWarm:
eat()
``````

How "truthiness" is defined in each programming language is not necessarily consistant and a source of debate, but regarding numbers, the general rule is "0" is `false` and any other number is `true`. Whereas bitwise operators would do the same thing but on each bit of the number.

We first will see what they do. Let's start with `not` since it only has one operand. It is generally represented with the `~` tilde symbol and basically invert all the bits of the operand.

``````~ 01010101
--------
= 10101010
``````

If we had used a "logical" `not` instead, the result would be `false` or `00000000` since non-zero is `true`.

Now bitwise `and` is generally represented with a single `&`. Each bit in the result will be "1" if the same bit in each operand is "1" as well.

``````  01010101
& 11000011
--------
= 01000001
``````

The bitwise `or` is generally represented with a single `|`. Each bit in the result will be "1" if at least one of either operand has the same bit set to "1".

``````  01010101
| 11000011
--------
= 11010111
``````

The bitwise `xor` is generally represented with `^`. It is similar to `or` except that each bit in the result will be "1" if either operand has the same bit on, but not both. Hence the name "exclusive or".

``````  01010101
| 11000011
--------
= 10010110
``````

The last 2 operators are `shift` operators, you have `shift` left and `shift` right. It basically moves the bits to the left or to the right. There are exceptions, especially with signed numbers, but generally speaking when the bits are moved outside of the length of the number are lost and the bits introduced are set to "0". Nethertheless there is a way to make a "circular" shift where bits that are shifted outside in one direction are coming back on the other side. We'll see that later in the tutorial.

The shifts operators are represented with `<<` and `>>`. The second operand represents the number of shifts.

``````   00111100
<< 00000001  Shift 1 time to the left
--------
=  01111000
``````
``````   00111100
>> 00000010  Shift 2 times to the right
--------
=  00001111
``````

Now that this is clear. Let's see some real life examples.

## Set and combine flags

One common use case of using bitwise operator is when you want to set flags. This is when you want to use each bit of a number in order to store a set of boolean values (yes or no, on or off, true or false). For example in a byte we can store 8 boolean values.

Let's take a very simple real life example. The permissions of a file on a computer are generally represented by a number with 3 flags for your personal access to it: the right to read, write and execute the file (in this specific order). For example if you have the right to read and execute a file, but not the right to write to it, the flags as a binary number would be `101` because only the first and third flags are on. In most programming languages, this is written `0b101`, which is 5 in decimal (`0b` indicates a binary number). If you're only allowed to read it, it would be `0b100`.

The typical way to use this is to have each option represented by a number that is the one with just its single bit on and to set options using `or`. Here is a pseudo code example:

``````READ =    0b100 # 4
WRITE =   0b010 # 2
EXECUTE = 0b001 # 1

setFilePermissions( "/path/to/file.txt", READ | EXECUTE )
``````

It is quite common to set binary options like this, especially in low level languages like C. I could have written the number 5 or `0b101` instead of `READ | EXECUTE`, but the purpose here is to give a name to options once and for all and then use this layer of abstraction to make the code legible.

Each number has only one "1", and represents only one flag. Therefore when you use the `or` bitwise operator, you basically combine the flags. But in the grand scheme of things the principle is that if you want to "switch" a specific bit on in a number, you just `or` it with a number which only has this bit on. For example if you want to make sure the third bit of a number is on, you `or` it with `0b100`.

On a side note, I said we've made the code more "legible". It is true but there is something missleading and this is where the term "hacking" reaches its full potential. While being more legible than "5", it still looks like "READ or EXECUTE" where you actually set both and it is tempting to use `and` instead. Trust me you'll get used to it, but one way to see it is as a list of things I can do: "I can read or execute".

In boolean algebra, `or` is often represented with a `+` sign, and `and` is represented with a `.` multiplication sign, which kind of helps, but this is mainly because the rules for reducing an equation are respectively quite similar.

As you can see on the Wikipedia page about bit masking, most bitwise operators can be seen as some sort of masking, but I think it is more obvious with the `and` operator. We generally use it to filter out a specific bit or a portion of bits with what is called a mask. And then you can test if the value is not zero. If it is, it means the bit is on. Let's use again our file permissions example. I have the permission as a number and I want to test if I have the permission to read it.

``````READ =    0b100 # 4
WRITE =   0b010 # 2
EXECUTE = 0b001 # 1

filePermission = getPermission("/path/to/file.txt")

if (filePermission & READ) != 0:
print "yes"
else:
print "no"
``````

Here is what happens if the file permission is `0b101`:

``````  101
& 100
---
= 100
``````

You can see the 3rd bit starting from the right is the only bit in both numbers. Since it is in the first `and` the second number, this bit is set in the result. And as a consequence is not "0".

And here is what happens if the file permission is `0b011`:

``````  011
& 100
---
= 000
``````

Here the operand don't share any bit in common, therefore the result has "0"s everywhere.

Now you can see how the second number acts as a mask. Every bit that is "0" in the mask will be "0" in the result. I like to think about them as "discarded". Bits that are set to "1" on the mask are kept exactly like they are on the first number. I like to think about them as "preserved".

Let's say I wanted to keep only the 4 lower bits of a byte, I could `and` it with the mask `0b00001111`.

``````  10100011
& 00001111
--------
= 00000011
``````

See? It "discards" from the original number all the bits that are "0" in the mask.

## Compact Data

Let's say we wanted to save numbers in a file and we want this file to be as compact as possible. We know that these numbers can only be between 0 and 15. You can represent a number between 0 and 15 with only 4 bits. Therefore if we want to make it compact, we could save 2 numbers in 1 byte.

A way to achieve this is to `shift` one of the numbers 4 times to the left and then use `or` to combine them. Since both numbers are below or equal to 15, we know that any bit above the 4th bit is necessarily irrelevant and set to zero. Therefore shifting one of the numbers by 4 bits to the left allows us to combine the 2 numbers without having an overlap. All the significant bits of each number is in its own half.

``````number1 = 0b00001010 # 10
number2 = 0b00001100 # 12
shiftedNumber1 = number1 << 4 # 0b10100000
combined = shiftedNumber1 | number2 # 0b10101100
``````

Then you can save these numbers to a file and it will take half the space. In order to read them from the file. You can use `and` to mask the relevant part and shift the first one in the opposite direction.

``````combined = 0b10101100
number1 = (combined & 0b11110000) >> 4 # 0b00001010
number2 = combined & 0b00001111 # 0b00001100
``````

This process is actually how you would print bytes in hexadecimal (e.g. "A0") because each character is exactly between 0 and 15. It is split in 2 characters. Therefore you would first extract both hex numbers and then print their characters between "0" and "F".

## Circular Shift

As mentioned earlier, there is a way to make a `shift` "circular", that is to say a `shift` where the bits that overflow on one side are coming back on the other side. Here is how you would do a left `shift` of 3:

``````number = 0b11110000
circularLeftShifted = (number << 3) | (number >> (8 - 3)) # 0b10000111
``````

It should be quite self-explanatory. We first do a normal left `shift` and we combine it using `or` with the same number that is shifted in the orther direction by the complementary number of places.

## XOR The Space Sheriff

I am sure you'll notice that I have kept `xor` for the end. Honestly I struggled to find a good use case for `xor` that is neither too simple, nor too complicated. But there is actually a good reason for that: `xor` is actually the wizard of them all. I mean all bitwise operators are magical in their own way, but `xor` is the most intriguing. It is probably why X-OR, one of the greatest space sheriff of all time is named after it (in the french version). This is why instead of detailed examples, we will concentrate on its interesting properties.

On the simple end of the spectrum, you will find that we can use `xor` the same way we use `or` in our file permissions example, except that instead of setting the bit to "1", it toggles it. This is already a very interesting property. In actually does the same thing as the `not` operator, except you can control which bits are inverted. You mask with "1"s the bits you want to invert/toggle.

Another particularity of `xor` is that if you `xor` a number with itself, it returns all "0"s. This is why it is used in checksums to compare numbers. It is also used with graphics, hashing, compression, cryptography. It many cases, one particular interesting aspect of `xor` that is exploited is the fact that the distribution probability of its output is 50% which is not the case with `or` and `and`.

Let's check the truth table or these operators to visualize this:

``````a | b | a AND b     a | b | a OR b     a | b | a XOR b
---------------     --------------     ---------------
0 | 0 | 0           0 | 0 | 0          0 | 0 | 0
0 | 1 | 0           0 | 1 | 1          0 | 1 | 1
1 | 0 | 0           1 | 0 | 1          1 | 0 | 1
1 | 1 | 1           1 | 1 | 1          1 | 1 | 0
``````

As you can see, the output of `and` has more chances to be "0" and the output of `or` has more chances to be "1". Only the output of `xor` is balanced. This makes it interesting for hash functions. You can combine numbers and because of its balanced output, you reduce the chances of two keys clashing.

I know all this is a little bit abstract but in its most simple form, if you want to hash a list of numbers (or anything since everything is a number in a computer) into one number for a quick comparison, you can `xor` them all.

``````hashForPin = 4 ^ 2 ^ 9 ^ 6
hashForDad = 'd' ^ 'a' ^ 'd'
``````

Please bear in mind that there is a big problem with this, so don't use it in your code unless you know the consequences. It does a good job at avoiding collisions (the output being the same for other numbers or words), but it does not take order into account. That is to say, if you hash the word "dad" and the word "add", the result would be the same for both. This is why there are tons of hashing algorithms that are all way more complicated than just `xor`ing the numbers. But it is the building block for them.

Obviously the fact that the order don't matter could be exploited as well. It could be exactly what you want.

## Conclusion

Parts in code where you use bitwise operators are generally hard to read, therefore I encourage you to bury these under a specific function with a clear name. In order to give more examples, we will publish another article commenting a project using some of these principles.