Thursday, September 16, 2010

Binary addition of very long numbers


"To non-mathematicians we point out that it is a commonplace of modern algebra for symbols to represent concepts other than the numbers", from An Algebra for Theoretical Genetics, Claude Shannon's 1940 PhD thesis at MIT


Computers represent integers as a series of bits, 32 bits in a row on my machine. Some programming languages offer longer integers. For example in C++ the longer data types(1) use two of my 32 bit chunks for a total of 64 bits. This allows representation of 264 different integers -- perhaps the positive integers from 0 to 18,446,744,073,709,551,615. 

I recently found a good excuse (the Collatz conjecture) to add integers that are much longer than this, perhaps a thousand bits or digits long. No data types this big are provided, and built in math operations like addition and multiplication only work on standard data types.

I could have found an arbitrary precision library for the programming language of my choice, but I know it would take me a while to learn the ins-and-outs of how to use the library. So I wrote my own function that does addition of arbitrarily long strings of characters. This was easy, fun and satisfying -- the sky's the limit when you can add numbers greater than 101000. The entire universe contains less than 10100 fundamental particles, so I'll need something larger to count. Maybe I can count the number of ways that the particles of the universe can be arranged. Maybe not; do I have enough memory to store my long strings? After all, a bit of memory must require a few particles. Or does it?

It was easier than I thought. I used strings to represent the integers, one dimensional arrays of the character (char) data type. They are easy to read (the number 1 reads as '1'), and easy to manipulate with built in string functions. It is inefficient, because it requires 8-bits to represent each character and a function that can compare 8-bit characters, but that wasn't a concern. It only takes a few thousand operations to add a pair of 1000 bit numbers, and that is plenty fast for my purposes. I figure the total time it takes is about .00001 seconds, or 10,000 additions/second.

What's the algorithm? We learned this in third grade: line up the two numbers, fill in imaginary 0's in empty spots, starting from the right add single digits (we memorized these single digit sums in second grade), add any carry digit, and carry a 1 to the left if the sum is larger than 10:

   11   (carry digits)
  1359
+ 0248
  ----
  1607

It would be no big deal to implement this algorithm on a string of characters from the set ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], but I wanted to see the binary numbers and only needed two characters, '0' and '1'.

The algorithm for binary integers is slightly more simple because the addition table is shorter: we only need to know that: 

if there is no carry bit (0):
0 + 0 = 0 carry 0
0 + 1 = 1 carry 0
1 + 0 = 1 carry 0
1 + 1 = 0 carry the  1

and if there is a carry bit (1), 
0 + 0 = 1 carry 0
0 + 1 = 0 carry the 1
1 + 0 = 0 carry the 1
1 + 1 = 1 carry the 1

Given two strings that represent 2 binary integers, for example '11011' and '110110' which represent the decimal integers 27 and 54:

 (1111100) carry bits
  0011011
+ 0110110
  -------
  1010001

The pseudocode is something like this:

{ given string arguments str1 and str2
  that represent binary integers

  // get the right-hand character of each input string
     b1 and b2
  // store the results, starting with an empty string
     answer = ''
  // and keep track of the carry bit, starting with 0
     carry = '0'

  while not at left end of either string{
     
     If carry == '0' {
          if b1 == '0'{
               if b2 == '0' {
                    append '0' to the left of answer
                    carry = '0'
               }
               else b2 == '1' {
                    append '1' to the left of answer
                    carry = '0'
               }
          }
          else b1 == '1' {
               if b2 = '0' {
                    append '1' to the left of answer
                    carry = '0'
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
     }
     else carry == '1' {
          if b1 == '0' {
               if b2 == '0' {
                    append '1' to the left of answer
                    carry = '0
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
          else b1 == '1'{
               if b2 == '0' {
                    append '1' to the left of answer
                    carry = '0
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
     }
                    
         // get the next two bits from the two input strings
         next b1 and b2

   }

   return answer
}

I was introduced to programming in the 1970's, in an era dominated by electronic calculators. For a long while I thought of computation as fundamentally about numbers. How could computers do anything with text, when they understand nothing about the meaning of letters and words. But I'm slowly changing my point of view. Now I think about calculation as pure symbol manipulation, where computers understand nothing about the meaning of numbers or text. 

Any meaning, with respect to number or text, is supplied by control stratements (while..., if..else) and the conditional statements they check (0 or 1, true or false, same or different). This comparison of two bits,  with the result of same or different, is one of a few core thing a computer does know how to do without software, as it is hardcoded into the CPU's hardware and microcode

This program has made symbol manipulation more concrete for me: it takes letters (abstract symbols), adds them, and outputs letters (more abstract symbols). I could replace every '0' with 'a' and every '1' with 'b', and the program would still add two binary numbers:

    aabaa 
+  aabaab 
   ------
= ababbba

I could just as well replace the letters with two pictures and add those. The meaning of the algorithm would be the same. I think I will choose black and white square images. I'll call them pixels, and add strange arrays of pixels with ambiguous meaning.




(1) These types have many names: doubleword, longword, long long, quad, quadword, int64, or in specific languages, C/C++ long (on 64-bit Unix[3][4][5]), C/C++ long long, C/C++ uint64_t, int64_t, C# long, ulong, Delphi Int64, T-SQL bigint.

No comments:

Post a Comment