Sunday, November 23, 2008
Counting Bits
There are many reasons we need to count the number of bits set in a data. It can be used to calculate the parity of a byte. To find the hamming distance etc.There are many ways to do it in C / C++.
Method 1
int countBits(unsigned int x){int i=0;while(x != 0){if(x & 0x1){++i;}x >>= 1;}return i;}
How it works?
Unlike the assembly language C or C++ don't have a operator to test if the bit is set. Here we mask the rest of the bits except the least significant bit and test if it is zero or not. It is guaranteed to work for unsigned number because the C++ standard guarantees it.
Shift operators
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a non-negative value, the value of the result is the integral part of the quotient of E1/2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
What it means is , when you shift right, the most significant bit is padded with zero as show in the diagram below.
Disadvantage:
For the worst case when least significant and the most significant bits are set eg. 10000001.
Method 2
This method is an exercise in Kernighan and Ritchie's "The C programming Language" book.
int countBits(unsigned int x){ int i=0;
while (x != 0) { ++i; x &= x - 1; // deletes the rightmost bit that is set in x } return i;}
How it works?
This method uses the property of subtracting a binary number with one. When its done the least significant bit set will be reset.
For decimal 3 which is 011 in binary, the following diagram illustrates the process.011 value & 010 value -1 count = 1 ------ 010 010 value & 001 value -1 count = 2 ------ 000As we see the bit set count is 2.
Advantage:
The number of loops depends on the number of bits set, even for the worst case eg. 10000001 it does it in two loops.
Method 3
Using lookup table. This speeds the operation because we are doing a nibble (4 bits ) at a time instead of a bit at a time.
const int int2bitCount[16] ={0,1,1,2,1,2,2,3,1,2,3,3,2,3,4,4};int countBits(unsigned int x){int count = 0;while(x != 0){count += int2bitCount[ x & 0xf ];x >>= 4;}return count;}
How it works?
The lookup table stores the number of bits that are set for its index. Here we are creating a lookup table for a nibble. 4 bits at a time seems to be a best compromise for space / time. We are also assuming that the integer is a multiple of 4 bits.
Advantage:
The number of loops for the worst case is sizeof( int ) / 4.
This is how you invoke it.
#include <iostream>
int main(int argc, char* argv[])
{
unsigned int x = 0xfffe;
std::cout << std::hex << x << " has " << std::dec << countBits(x) << " bits" << std::endl;
return 0;
}
Labels: c++