CRC32

Theory of CRCs
==============

The definitive online CRC documentation is by Ross Williams:
http://www.ross.net/crc/crcpaper.html

...and I recommend you read that if you want to know what's going on.  I
wrote the notes here to get my own head around the subject, but if they help
anyone else, good.

* Fields

A field is something that supports multiplication and addition.  It always
has at least two elements, 0 (the additive identity) and 1 (the
multiplicative identity).  Adding or multiplying any two elements of the
field gives another element of the field (i.e. the field is "closed").
Addition and multiplication in any field are:

* commutative - they work both ways around, i.e. a+b=b+a.

* associative - you can move brackets around, i.e. a+(b+c)=(a+b)+c.

There are other properties we don't care about right now.

Note that addition and multiplication in some given field *don't* have to
correspond to the familiar concepts of addition and multiplication.

An example of a field would be the rational numbers (i.e. all numbers of the
form A/B where A an any integer and B any positive integer).  In this case
addition and multiplication work like you expect.

* Polynomials

A polynomial over some field is a sum of finitely many terms where each term
is a member of that field multiplied by a unique non-negative integral power
of a variable.  The degree of the polynomial is the highest power that
occurs in it.

e.g. 5X^7 + 2.5X^2 + 1 is a polynomial of degree 7 over the rational
numbers.

X is just a placeholder; we could evaluate the polynomial by substituting a
value for X but we don't care about that here.

* GF(2)

GF(2) is the field with exactly two elements: 0 and 1.  (If you are familiar
fields then it is clear that) addition is equivalent to bitwise XOR; this is
obviously convenient for computers.  CRC theory involves polynomials over
GF(2) such as:

P(X) = X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X+1

Note that if we know the degree of a polynomial over GF(2), and its more
than 0, we only have to care about storing the coefficients of power *less*
than the degree.  This will come in handy later.

* Division Of Polynomials

Polynomials can be divided to produce a quotient and a remainder (both
polynomials).  This isn't the same as evaluating them at some value and then
dividing the results.

We always divide a dividend by a divisor, to produce a quotient and a
remainder (or "modulus") which are both polynomials.

A simple example is

   X^4 + X^2
   ---------   =  X^3 + X, with remainder 0
       X

Here is a less simple example:

   X^4 + X^2 + 1
   -------------  =  X^3 + X^2, with remainder 1        (in GF(2))
       X + 1

You can compute this using the usual long division algorithm that is
commonly applied to integers.  This is actually easier in the polynomials of
GF(2) than in either the integers or the polynomials of the rationals,
because each "digit" (i.e.  coefficient) is just a 0 or a 1, and the
subtraction is an exclusive OR.  (It's also more difficult in that you're
doing arithmetic using unfamiliar operations.)

Here is the working for the division above:

              X^3 + X^2 +   0 +   0  <-- quotient
      .----------------------------
X + 1 | X^4 +   0 + X^2 +   0 +   1
        X^4 + X^3
	   ---------
	         X^3 + X^2
		 X^3 + X^2
		 ---------
		         0 +   0
			       0
			 -------
			       0 +   1
			             0
			       -------
			             1  <-- remainder

You can check that the answer is correct by multiplying the quotient by the
divisor and adding the remainder: the answer should be equal to the
dividend.

Given that all the coefficients are 0 or 1, we can omit the variable and the
addition signs and just write the coefficients:

      1100
   .------
11 | 10101
     11
	--
	 11
	 11
	 --
	  00
	  00
	  --
	   01
	   00
	   --
	    1

* CRCs

A byte can be converted into a polynomial over GF(2) of degree 7 or less.
If this is B(X) and P(X) is some fixed polynomial of degree N then the CRC
is the remainder of B(X) * X^N/P(X).

In practice we are interested in the CRC of a stream of zero or more bytes,
but this is just same idea for longer B.

(In reality the order of bits in B varies between specifications for
different CRCs, and sometimes there are some additional wrinkles; but we'll
come to those.)

In practice of course we don't do the computation in terms of polynomials,
we leave bytes as bytes.

A computation might look like this.  Here P=1010, or X^3+X, and B=11010101
(or X^7+X^6+X^4+X^2+1).  Note that muliplying by X^3 is equivalent to adding
three zero bits on the right.

          11101 000      (in reality we ignore the quotient)
P    .------------------
1010 | 11010101 000
       1010 | |   |
	  ---- | |   |        111
	   1110| |   |
	   1010| |   |
	   ----v |   |        100
	    1001 |   |
	    1010 |   |
	    ---- |   |        011
	     0110|   |
	     0000|   |
	     ----v   |        110
	      1101   |
	      1010   |
	      ----   |        111
	       111 0 |
	       101 0 |
	       ----- |        100
	        10 00|
		10 10|
		-----v        010
		 0 100
		 0 000
		 -----        100
		   100       (the remainder)

Doing this by hand is one thing, we want to formalize it so a computer can
do it.

* A Basic Algorithm for CRC Computation

Let P be the binary representation of the divisor polynomial, of degree N,
and so N+1 bits long.  Let B be a message, with N extra zero bits added.  R
will be the remainder, N bits long (really it should be N+1 but it's always
safe to ignore the last bit).

Algorithm 1.

1) let R be 0.

2) if there are no more bits in B, the answer is R.  Stop.

3) shift R left by one.  The new bit 0 should be the next bit of the
message.  We also shift a bit out of the left of R.

4) If this bit is 0 go back to 2.

5) Otherwise it is 1.  XOR R with the bottom N bits of P and go back to step
2.

If we apply this algorithm to the above division then we get the folowing
steps:

R=000
B=11010101+000
P=1010

R<-001  B<-1010101+000  shifted 0
R<-011  B<-010101+000   shifted 0
R<-110  B<-10101+000    shifted 0
R<-101  B<-0101+000     shifted 1
R<-111
R<-110  B<-101+000      shifted 1
R<-100
R<-001  B<-01+000       shifted 1
R<-011
R<-110  B<-1+000        shifted 0
R<-101  B<-000          shifted 1
R<-111
R<-110  B<-00           shifted 1
R<-100
R<-000  B<-0            shifted 1
R<-010
R<-100  B<-empty        shifted 0
stop with R = 100.

Shifting R left corresponds to looking at the next term in the
dividend, looking at the bit shifted out the top corresponds to
comparing the divisor with the remainder, and the XOR to
subtracting the divisor from the remainder.

* Scaling Up

Normally we would have a much larger P: say degree 32 (so 33 bits long - in
practice we store this in a 32 bit word with the topmost bit implicit; we
don't need it anyway.)

Also it's inefficient and inconvenient to process every bit separately when
we only ever handle streams of bytes; we want to process whole bytes at
once.  In other words we want to somehow encapsulate 8 iterations of the
above algorithm into a single step.

Imagine we've just shifted 8 bits into R (and so 8 bits out of the top).  If
the top bit of the byte we've shifted out (call it S) is set then we XOR in
the 32 bits of P to the top 25 bits of R, and the bottom 7 bits of S.  For
the next bit we XOR with the the top 24 bits of R and the bottom 6 bits S;
and so on.

Note that we only care about XORing things into S for the purposes of
determining the next few values of S - which in turn only affects what we
XOR in to R; we'll shortly discard S and pick a new one.  So if we can take
a single value of S and use that to do all the XORs into R at once, we never
need alter S.

XOR is associative, so we can do this.

How do we calculate the table?  Given a byte S we're going to XOR R with
some value, so if we set R to 0 then the resulting R will be the value to
XOR R with for that S.  So we iterate over each S and do the calculation
"long-hand", and that gives us the lookup table.

crc32tab.pl contains an example.

If T is the table then our algorithm is then:

Algorithm 2.

1. Let R=0.
2. If B is now empty, the answer is R.  Stop.
3. Let S be the top 8 bits of R.
4. Shift R left by 8.
5. Replace the bottom 8 bits of R with the next byte of the message.
6. XOR R with T[S].
7. Goto 2.

* Another Optimization

The extra N bits that we have to paste onto the end of the message are a
nuisance.  It turns out that there is an optimization that can get rid of
them.

First, consider a byte from the message as it is introduced into R
(in step 5) and propagated up through it.  When it comes out the
top it will have been XOR'd with four different values, each taken
from T.

Second, observe that the four padding bytes at the end just mean
that the last byte of the message makes it through R to the top.

Thirdly, notice that the first four lookups into T are always for
T[0], which is 0.  So the first four steps don't do any useful
work.

If instead on the first step we could deduce what the correct value
to XOR with the first byte of the message was, we could eliminate
those first four steps.  Of course we can: it's 0.

We can do this for successive steps as well.  The values that a
byte is XOR'd with as it goes through R depend solely on the part
of R "above" it; in other words if we introduce a 0 into R and only
XOR in the message byte at the end, just before the lookup in T,
rather than at the beginning, it doesn't actually change which
entry of T we look up for the next XOR.

(We are again relying on the associtivity of XOR.)

The following algorithm implements this, with the same
preconditions as before except that the padding bytes are no longer
necessary.

Algorithm 3.

1. Let R=0.
2. If B is now empty, the answer is R.  Stop.
3. Let S be the top 8 bits of R.
4. XOR S with the next byte of the message.
5. Shift R left by 8 bits.
6. XOR R with T[S].
7. Goto 2.

* CRC32

CRC32 has a polynomial of:

X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X+1

It also has the variation that the representation is backwards,
i.e. the MSB or a byte or word contains the coefficient of X^0; the
initial value of R is FFFFFFFF and the final result is XOR'd with
FFFFFFFF.  The binary representation of P comes out as EDB88320
(the X^32 coefficient being implicit).  This means that "left" and
"right" are reversed, as are "top" and "bottom".

crc32tab.pl

#! /usr/bin/perl -w
use strict;
use integer;

my $P = 0xEDB88320;

# this is just algorithm 1 from crc32.c, applied to each possible byte, with
# 32 bits of padding, which we handle implicitly - effectively we've bubbled
# the byte $b through 32 shifts before we enter the $bit loop.

for(my $b = 0; $b < 256; ++$b) {
  my $r = $b;
  # we want the highest coefficient first, this is the lowest bit number
  for(my $bit = 0; $bit < 8; ++$bit) {
    # bit 0 will be the bit that "pops out" of $r
    if($r & 1) {
      # multiplying through by X is a _right_ shift in this representation
      $r = ($r >> 1) & 0x7FFFFFFF;
      $r ^= $P;
    } else {
      $r = ($r >> 1) & 0x7FFFFFFF;
    }
  }
  if($b % 4 == 0) {
    print "  ";
  }
  printf "0x%08X,", $r;
  if($b % 4 == 3) {
    print "\n";
  } else {
    print " ";
  }
}

RJK | Contents