CSCI 241 Labs: Prelab 14
Data Encryption


Much of this document is borrowed from "Pascal Programming and Problem Solving, 4th ed." by Sanford Leestma and Larry Nyhoff, Macmillan Publishing Co, 1993.

Encryption refers to the coding of information in order to keep it secret. The string of characters comprising the information is called plaintext. It is transformed into another string by an algorithm that produces a coded form of the information called a cryptogram or ciphertext. The coded version may be safely stored or transmitted. At a later time it can be deciphered by reversing the encrypting process to recover the original information.

Data encryption has numerous uses in the real world.

  1. Governments can safely communicate with their embassies around the world.
  2. The military can safely communicate with widely dispersed units.
  3. Banks can (and do) safely transfer billions of dollars to other institutions daily.
  4. A CS professor here at UW-Parkside has his mother's Little Orphan Annie secret decoder ring, which she obtained by sending in box tops as a child.

 

Caesar Cipher

Data encryption techniques vary widely. The simplest encryption schemes are based on the string operation of substitution, in which each character in the plaintext string is replaced by some other character according to a fixed rule. For example, the Caesar Cipher scheme consists of replacing each letter by the letter that appears k positions later in the alphabet for some integer k, where k is a positive integer. (The alphabet is thought of as being arranged in a cycle, with A following Z.) In the original Caesar cipher, k had the value 3, so that each occurrence of A in the plaintext was replace by D, each B by E, . . ., each Y by B, and each Z by C. For example, we would encrypt the string "IDESOFMARCH" as follows:
    I D E S O F M A R C H      plaintext

                               add 3 to each character position to get:

    L G H V R I P D U F K      ciphertext

  

To decode the message, the receiver simply replaces each character in the cryptogram by the character which lies k positions earlier in the alphabet. This is obviously not a very secure scheme, since it is easy to break the code by simply trying the 25 possible values for the key value, k.

 

Vigenere Ciphers

It would certainly be harder to decode ciphertext that didn't have a constant offset (like the k = 3 in the Caesar Cipher). What if each original character could be offset by differing amounts? This can be done by using a keyword to specify several different displacements of letters rather than the single offset k. This is Vigenere cipher scheme. In this scheme, a keyword is added character by character to the plaintext string. Each resulting character is represented by its original position in the character set, plus the corresponding keyword letter position, and addition is carried out mod 26.

For example, if the positions of A, B, C, . . . ,Z are given by 0, 1, 2, . . ., 25 respectively, and the keyword is DAGGER, the message "IDESOFMARCH" is encrypted as follows:

    I  D  E  S  O  F  M  A  R  C  H     plaintext

    D  A  G  G  E  R  D  A  G  G  E     keyword, repeated if necessary

    L  D  K  Y  S  W  P  A  X  I  L     ciphertext



    8  3  4 18 14  5 12  0 17  2  7     position of original characters 
3 0 6 6 4 17 3 0 6 6 4 position of keyword character 11 3 10 24 18 22 15 0 23 8 11 (original+keyword) % 26

The receiver recovers the message by subtracting the characters in this keyword from those in the cryptogram.

 

Substitution Table Ciphers

Yet another substitution can be done by using a substitution table, which sets up the correspondence with which one character directly substitutes for the other. This is the type of puzzle seen in the newspaper section near the crossword.

For example:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Q W E R T Y U I O P A S D F G H J K L Z X C V B N M
The character in the second row of the table substitutes for the corresponding character in the first row.

The string "IDESOFMARCH" would be encrypted using this substitution table as follows:

    I D E S O F M A R C H      plaintext

    O R T L G Y D Q K E I      ciphertext

  

To decrypt, the receiver simply uses the substitution table in reverse.

Since there are 26! (26 factorial, or approximately 1028) possible substitution tables, this scheme is considerably more secure than the simple Caesar cipher scheme. Experienced cryptographers can easily break the code, however, by analyzing frequency counts of certain letters and combinations of letters.

 

Permutation Ciphers

Another basic string operation which can be used in encryption schemes is permutation, in which the characters in the plaintext are rearranged. You can also break the original string into smaller blocks of characters, and rearrange within each block.

For example, we might divide the message into blocks (substrings) of size 3 and permute the characters in each block as follows:

   Original Position:   1 2 3

Permuted Position: 3 1 2

Thus, the message "IDESOFMARCH" would be encrypted (after the addition of a randomly selected character X so that the string length is a multiple of the block length) as:

    I D E S O F M A R C H X     plaintext

    D E I O F S A R M H X C     ciphertext

  

To decrypt the cryptogram, the receiver must know the key permutation and its inverse.

   Cryptogram's Position:   1 2 3

Permuted Position: 2 3 1 (this is the original string)

Data Encryption Standard (DES)

Many modern encryption schemes use a combination of substitution and permutation operations. Perhaps the best known is the Data Encryption Standard (DES) developed in the early 1970s by researchers at the IBM Corporation. It consists essentially of a permutation followed by a sequence of 16 substitutions, followed by a final permutation. The substitution operations are similar to those in earlier examples. Some use keyword addition (16 different keywords), and others use substitution tables.

DES was adopted in 1977 by the National Institute of Standards and Technology as the standard encryption scheme for sensitive federal documents. It was revised in 1983, 1988 and 1993. In 2001, a new, even more complex standard known as the Advanced Encryption Standard (AES) was adopted.

 

Public Key Systems

Each of the preceding encryption schemes requires that both the sender and the receiver know the encryption key or keys. This means that although the cryptogram may be transmitted through an insecure public channel such as a telephone line, the key(s) must be transmitted separately in some secure manner, for example, by courier. This problem of maintaining secrecy of the key(s) is compounded when it(they) must be shared by several persons.

Public key encryption schemes eliminate this problem by using two keys, one for encryption and one for decryption. These schemes are called public key because the encryption key is made public by the receiver to all those who transmit message to him or her; the decryption key, however, is known only to the receiver. The security of these schemes depends on it being nearly impossible to determine the decryption key if one knows only the encryption key.

 

Rivest, Shamir, Adelman (RSA) Algorithm

In 1978, R.L. Rivest, A. Shamir, and L. Adelman proposed one method of implementing a public key encryption scheme. In their RSA algorithm, the public-key is a pair of integers (e,n). One encrypts a message string M by first dividing M into blocks, M1, M2, . . ., Mk and converting each Mi block of characters to an integer Pi, which is in the range 0 through n-1. M is then encrypted by raising each block to the power e and reducing by using modulo n:
    M = M1, M2, . . . Mk  becomes P1, P2, . . . Pk

    C = C1, C2, . . . Ck, where Ci = Pie % n

The cryptogram C is decrypted by raising each block Ci to the power d and reducing by using modulo n, where d is a secret decryption key.

To illustrate, suppose that the characters are converted to numeric values using the codes 0, 1, 2, . . . 25 for the letters A, B, C, . . ., Z, respectively, and the public-key pair (e, n) is (17, 2773). To encrypt the message M = "IDESOFMARCH" using the RSA algorithm, we divide M into two-character blocks M1, M2, . . ., M6. (We append a randomly selected character X to make the blocks come out evenly.) We represent each block Mi as an integer Pi. Pi is in the range 0 through 2772, and is found by concatenating the numeric codes of the characters that comprise the block:

     ID    ES   OF  MA   RC   HX      Mi

    0803 0418 1405 1200 1702 0723     Pi

    0779 1983 2641 1444 0052 0802     Ci

  
where 0779 = 080317 % 2773, etc.

For this encryption key, the corresponding decrypting key d = 157. Thus, we decode the cryptogram by calculating Pi = Ci157 % 2773.

Now it's time to discuss how we chose (17,2773) for our public-key pair (e,n). The number n is the product of two large "random" primes p and q:

 n = p*q

  
In the preceding example, we used the small primes 47 and 59 to simplify the computations (2773 = 47 * 59). However, Rivest, Shamir, and Adelman suggest that p and q have several hundred digits. The decrypting key d is then selected to be some large integer that is relatively prime to p-1 and q-1; that is, one that has no factors in common with either number. In our example, d = 157 has this property. The number e is then selected to have the property that
 e*d %((p-1)(q-1)) = 1

  
To break this code, one must be able to determine the value of d from values of n and e. Because of the manner in which d and e are selected, this is possible if n can be factored into a product of primes. Thus, the security of RSA encryption is based on the difficulty of determining the prime factors of a large integer. Even with the best factorization algorithms known today, this is a prohibitively time-consuming task. Although research on factorization continues, no efficient algorithms have been found that allow even super computers to factor numbers several hundred digits long in less than years.

 

Pretty Good Privacy (PGP)

The U.S. government takes encryption very seriously on two fronts. It wants to protect its sensitive information. It also wants to be able to decrypt supposedly private communications between individuals who might pose a threat to the U.S. In 1991, Phil Zimmerman wrote a freeware version of the RSA algorithm and called it Pretty Good Privacy (PGP). He was the target of a three-year criminal investigation, because the government held that US export restrictions for cryptographic software were violated when PGP spread all around the world. Despite the lack of funding, the lack of any paid staff, the lack of a company to stand behind it, and despite government persecution, PGP nonetheless became the most widely used email encryption software in the world. In 1996 the government finally dropped its case. Zimmerman founded PGP Corporation, which is currently a very highly regarded data security company.