There are many kinds of cryptosystems, or schemes for encoding and decoding information in this way. For example, a substitution cipher where you use one letter to stand for a different one, e.g., 'A' might stand for 'F', and 'P' for 'Y', etc. You have probably seen this scheme presented as a puzzle in your Sunday newspaper; they are easy to break.
A more secure system is the RSA public-key cryptosystem. A public-key cryptosystem is one in which anyone who participates has two keys; these are like a password for encrypting and decrypting data. You have a public key and a private key. When someone wants to send you a message, they use your public key (that you have published somewhere) to encrypt the data. They send you the encrypted message that you decrypt with your secret private key. The public key is only good for encrypting, not decrypting, so communication is secure.
The RSA system is based on the difficulty of finding the prime factors of large numbers versus the relative ease of finding out whether a large number is prime. Generating the keys for RSA is a matter of testing for primality. Encoding a message is a matter of doing something knowing a large integer, without knowing the factors. Decoding a message is dependant on knowing the factors. For 300 or more digit numbers, finding the prime factors is impossible in a reasonable amount of time with today's best algorithms, and is likely to remain so in the future.
This is how the RSA system works. A participant creates his public and private keys like this:
To encrypt a message block M with the public key P = (e,n), we take Me (mod n).
To decrypt a ciphertext (i.e. encrypted block) C, using the private key S = (d,n) we apply the reverse operation, taking Cd (mod n).
There is a rich theory of modular arithmetic and number theory in general upon which RSA is based. We won't go into the complex issues involved there, rather, we'll see some of the algorithms that go into implementing RSA and some of the algorithms that try to break it.
// returns ab (mod n) Mod-Exp-1 (a, b, n) product = 1 for i in 1..b do product = product * a end for return product % nLet's analyze this algorithm. It does (b) multiplications. Recall that in RSA, a could be a 300 digit number. The number of digits required to store a number a is (log a); the number of digits required to store ab is thus (log ab), = (blog a). If b is also very large (as it could be in the case of decryption), we will require huge amounts of storage just to store that quantity. So if a and b are both 300 digit numbers, we will need around 300 * 10300 = 3 * 10302 digits to store product. This is substantially more than the number of elementary particles in the known universe, so it is unlikely to be successfully allocated by malloc().
An improvement that doesn't require such astronomical storage is to do the modulus operation each time through the loop, so that we can keep product small:
// returns ab (mod n) Mod-Exp-2 (a, b, n) product = 1 for i in 1..b do product = product * a product = product % n end for return productNow, the size of product can never exceed twice the size of a, so we may require up to 600 digits of storage. However, if b is still large, e.g., 300 digits, then the loop will have to iterate about 10300 times. Assuming a (very generous) time of one nanosecond per loop iteration, this would take about 3.16 times 10280 years to complete. It is conjectured that the universe itself is only about 2 times 1013 years old, and will experience heat death due to the second law of thermodynamics long before 10280 years from now, so this is also an unacceptable algorithm.
Suppose you want to find 28. You could multiply 2 by itself 8 times, 2*2*2*2*2*2*2*2 = 256, or simply square repeatedly log2 8 = 3 times, getting 22=4, 42 = 16, 162 = 256. This method only works for exponents that are powers of 2 like 8, but there is an easy generalization that allows us to use any exponent. This yields the following algorithm for modular exponentiation:
// returns ab (mod n) Mod-Exp (a, b, n) product = 1 y = a while b > 0 do if b is odd then product = (product * y) % n; endif y = (y * y) % n; b = b / 2 end while return productLet's try this on a small example, first without worrying about the modular part: Let a=2, b=10.
product b y init 1 10 2 1 5 4 4 2 16 4 1 256 1024 0 65536y keeps track of the "current" power of 2 that we keep squaring through the loop. If b is odd, then at that point in the binary representation of b, there is a 1 bit; this means there is an extra factor of a that wouldn't appear if b were a perfect power of two, so we multiply it in.
The analysis is much less grim. Since we are dividing b by 2 each time, the loop can only go log2 b times before b=0. Each multiplication and modulus takes the square of the number of digits involved, which is bound above by log n. So the whole loop takes time O(log b log2n). For 300 digit numbers, this takes a few seconds or less on an average computer.
How do we determine whether a number p that we suspect is prime is actually prime? There is a theorem in number theory called Fermat's Little Theorem (not to be confused with his famous "Last Theorem" that was proved only recently): If p is prime, then
ap-1 = 1 (mod p) for all positive integers a < p.For some bases a, you will find some composite numbers n that also satisfy an-1 = 1 (mod n); however, the probability that it will be true for one value of a is independent from whether it will be true for a different value of a. Thus, the more values of a we choose and use to verify ap-1 = 1 (mod p), the more sure we are that p is in fact prime. This assuredness (i.e., the probability that p is prime) doubles for each a we try, and is very high to begin with, so with just a few values of a we can be almost certain that p is a prime suitable for use with RSA. If it isn't, we'll find out quickly enough when the algorithm fails (but this never happens, because we can make the probability that p is not prime arbitrarily low).
So this algorithm will return False if its parameter is composite and True if the parameter is "probably" prime:
Probably-Prime (p) for a in 2..20 do if ap-1 != 1 (mod p) return False end for return TrueWe can simply use the same modular exponentiation algorithm from above to obtain an efficient algorithm.
A simple algorithm is trial division: to factor n, try dividing it by every number from 2..sqrt(n); if there is ever a zero remainder, then you have found a factor. If k is the number of digits in n (i.e., about log10 n), then this algorithm takes O(10n/2) divisions. For a 300 digit number, this is about 10150, another one of those astronomical numbers.
There are other algorithms that perform much better. The Number Field Sieve is arguably the best known algorithm. It runs in time O(exp(sqrt(log k log log k))), much better than O(10k/2), but still superpolynomial. It took a distributed effort of thousands of computers running a highly tweaked implementation of this algorithm almost a whole year to find the prime factors of one 129-digit number that was proposed as a challenge in the late 1970s by the authors of RSA. In recent years, challenge numbers RSA-576 and RSA-640 were factored. It took 30 2.2GHz AMD Opteron CPU years to factor RSA-640; the team that did this won $20,000 from RSA Laboratories. RSA-704 and higher digit numbers remain unfactored. There is a $200,000 prize for factoring RSA-2048.