(This section is taken from an earlier draft of squozen_code.html where I was using nbits() instead of P().)

We can ask, how big is the message after the first gap s is removed. At that point the number of gaps decreases by one, and the total of the gap widths decreases by s. Subtracting the two sizes gives the number of bits it took to encode s. Let's call this function nbitsC() for Combinatorics:

               (1000000 +232-1)!         (999999 +232-1-s)!
nbitsC(s)=log2(-----------------) - log2(-----------------)
                1000000!(232-1)!          999999!(232-1-s)!
Of course this is a division (the inverse of the fraction of messages that start with s):
               (1000000 +232-1)!  999999!(232-1-s)!
nbitsC(s)=log2(-----------------------------------)
                1000000!(232-1)! (999999 +232-1-s)!       
Separating the first factor of the first factorial...
       (1000000 +232-1) (999999 +232-1)!  999999!(232-1-s)!
= log2(---------------------------------------------------)
        1000000!(232-1)!                 (999999 +232-1-s)!       
Canceling and rearranging a bit...
       1000000+232-1         (999999 +232-1  )! (232-1-s)!
= log2(-------------) + log2(----------------------------)
          1000000            (999999 +232-1-s)! (232-1  )!
When s = 0, the right term is zero, and the constant on the left is the number of "overhead" bits per codeword:
                 1000000+232-1
nbitsC(0) = log2(-------------) ~= 12.0687673
                   1000000 
When s is small, the right term is approximately
                            999999 +232-1-(s+1)/2
nbitsC(s)-12.0687673 = log2(---------------------) * s
                               232-1-(s+1)/2
which is fairly linear until s gets close to 232, as the following table (calculated
here) shows:

s

nbitsC(s)

(nbitsC(s) - nbitsC(0)) / s

nbitsC(M) - nbitsC(s)

12.069 0.00033586418096 13511271.040
10  12.072 0.00033586418131 13511271.037
100  12.102 0.00033586418483 13511271.007
1000  12.405 0.00033586422002 13511270.704
4295* 13.511 0.00033586434884 13511269.598
10000  15.427 0.00033586457187 13511267.682
100000  45.656 0.00033586809049 13511237.453
1000000  347.972 0.00033590328193 13510935.137
10000000  3374.626 0.00033625574033 13507908.483
100000000  33995.614 0.00033983545542 13477287.495
1000000000  382343.723 0.00038233165443 13128939.386
2000000000  904038.383 0.00045201315694 12607244.726
3000000000  1729352.864 0.00057644693179 11781930.245
232-1000000001  2102103.942 0.00063797048202 11409179.167
232-100000001  5417560.276 0.00129143991514 8093722.833
232-10000001  8676830.938 0.00202494401275 4834452.171
232-1000001  11511294.901 0.00268080356501 1999988.208
232-100001  13027846.123 0.00303334961462 483436.986
232-10001  13430353.678 0.00312700236274 80929.431
232-1001  13499880.218 0.00314318372302 11402.891
232-101  13509814.710 0.00314549611858 1468.399
232-11  13511105.584 0.00314579660777 177.525
232-2  13511263.178 0.00314583329368 19.932
232-1  13511283.109 0.00314583793363 0.000
*the average gap width

This page was made with a php script and a Python program. Last change November 06, 2009.

--Steve Witham

Up to