Introducing Cryptanalysis

To understand why serious encryption algorithms are so complex, and why it's
so important to be careful with the critical secrets that make an encryption
system work, it's useful to understand something about how people break
encryption systems. The study of this is called cryptanalysis, and it's
an amazingly fascinating field of applied mathematics. I'm going to be
interspersing information about cryptanalysis with my cryptography posts. One
thing to remember here is that we'll be talking about it mainly in the context
of how you can break an encryption system - but cryptanalysis is also used for
designing cryposystems, because you can only design a successful cryptosystem by
thinking about how it can defeat the ways that it could be broken.

One caveat: I'm going to be describing cryptanalysis in terms of how I understand it, which is sometimes different from classical descriptions by cryptanalysists. My
understanding is strongly rooted in computation and information theory, rather than pure math. So sometimes my presentation will be a bit different, but hopefully by staying in the ground where I'm most comfortable, I can do a better job of making it comprehensible.

To be a bit formal for a moment, a simple cryptosystem consists of
two functions:

  1. C = En(P, K)
  2. P = De(C, K)

Where C is encrypted text (called the cipher-text), P is the unencrypted text (called the clear-text), K is the secret key at the heart of the system. We typically think of K as being something like a password, but it doesn't have to be - in a simple substitution cipher, K is the mapping from clear-text characters to cipher characters.

A good cryptosystem is one where it's easy to generate the cipher-text given
the clear-text and the secret key; easy to generate the clear-text given the
cipher-text and the secret key; and very hard to figure out the secret key, even
given a large number of clear-text/cipher-text pairs.

When we analyze cryptosystems to try to divine their secrets, we
can look at them in terms of a number of different attack scenarios. What I mean by attack scenario is that we're looking at a cryptosystem
that we want to break. An attack scenario is a way of describing
what kind of information the agent trying to crack the cryptosystem
has access to. For example, some of the common scenarios discussed
include:

  1. Cipher-only: we can watch encrypted messages sent from one user
    to another, giving us a large library of encrypted text. But we don't know
    any of the clear-text.
  2. Known clear-text: we have a library of encrypted text, like we did in
    cipher-only, and in addition, we have some collection of clear-text/cipher-text
    pairs.
  3. Chosen clear-text: we have a library of encrypted text, like we did in
    cipher-only, and in addition, we can select a finite set of clear-texts,
    and have them encrypted.
  4. Iterated chosen clear-text (also sometimes called "black-box"):
    like chosen clear-text, but instead of
    only being able to select a set of texts exactly once, we can submit
    a set of texts, get back the cipher-text, and then based on what we learn
    from earlier sets of plain/cipher-text pairs, we can select more sets of
    clear-text to encrypt.

These scenarios all assume that you know the basic cryptosystem. In
general, that's a reasonable assumption. If you don't have a clue
about what kind of cryptosystem was used to encrypt a cipher-text,
things get much harder. There are some tricks that you can use
to make a guess at what kind of cryptosystem was used, but for non-trivial
cryptography, you're often pretty much honked if you don't have any idea of
what sort of crypto was used.

To give you a sense of what it's like to attack a cryptosystem,
I'll walk through a simple example. Below is a simple cipher-text,
encrypted using a simple substititution, with spaces and punctuation
preserved.

b czfbczc bh bu gqlvh hbxz hl uhgih vd xt lkp qrly. lpz
hjbpy hjgh b jgaz plhbfzc bp xt hbxz qilkubpy hjz qrlyludjziz bu hjgh hjziz giz
g kjlrz rlh lm hziibmbf ufbzpfz qrlyu lvh hjziz: ligf, djgitpyvrg, gzhblrlyt,
evuh hl pgxz g mzk. qvh hjziz bu plh pzgirt ul xvfj lvh hjziz czcbfghzc hl xghj
- gpc bp dgihbfvrgi hl hjz xbuvuz lm xghj. b hjbpw hjgh hjgh bu g cgxp ujgxz,
qzfgvuz bp xt zodzibzpfz, lpz lm hjz xluh frzgi kgtu lm bczphbmtbpy g figfwdlh
bu hjilvyj xghj. pl xghhzi hjz udzfbmbf uvqezfh, hjz figfwdlhu grkgtu zbhjzi
galbc li ufizk vd hjz xghj. kjzhjzi bh'u hjz "xzifvit fgvuzu gvhbux" mlrwu, hjz
azrbwlauwbgpu, fizghblpbuhu, grh-xzcbfbpz svgfwu, izdvqrbfgp dlrruhziu, li
ufbzphlrlybuhu - tlv fgp grkgtu izflypbnz hjz figfwdlhu qt hjzbi xghj. ul b gx
ylbpy hl cl xt qzuh hl dilabcz g albfz lm xghjzxghbfgr ugpbht - qlhj qt ujlkbpy
kjgh'u kilpy kbhj hjz qgc xghj urld dvxdzc lvh qt hjz rllpbzu, gpc qt ujlkbpy
jlk yllc xghj kliwu.

The first question - and this is always the first question in
an attempt to break an encryption - is: "What do we know?"

In this case, we know quite a lot:

  1. The clear-text is english text.
  2. The encryption is a simple substitution, which preserves
    spaces and punctuation, but not case.

That might not look like a lot, but it's actually huge. Knowing
that the clear-text is english gives us a huge amount of information
that we can use to break it. Knowing that work-breaks are preserved
makes it nearly trivial to solve.

To break it, we'll use a simple form of frequency analysis. Frequency analysis is a great technique for attacking simple ciphers. The
idea of it is that we know a huge amount of information about patterns
in human languages. Different letters don't appear equally frequently in
English; for example, there are a lot more "E"s than "U"s.

A lot of people mistakenly think that frequency analysis means only using the specific frequency of individual letters. That's not true. A good
frequency analysis based attack on a cryptosystem uses not just the frequency of
particular letters, but of groups of letters, relationships between
letters, and frequencies of words.

So let's start deciphering that message. To try to make it easier
to see how we're decrypting, as we figure out letters, I'll write
the message with the clear-text and the cipher-text tangled - clear-text
letters will be uppercase, and cipher-text letters will be lowercase. When
we figure out that a cipher-text letter "a" encodes to a particular
clear-text letter "B", we'll write that as "a/B".

We'll start off with a nice word-based frequency trick. There are only two
common one-letter words in english: "A" and "I", and "I" occurs more frequently
at the beginning of a sentence than "A". So there's a good chance that a single
letter word that occurs frequently, including at the beginning of sentences is
"I". So we'll try that here: there are two one-letter words in the cipher-text:
"b" and "g", and "b" occurs at the beginning of sentences. So we'll try "B/I"
and "G/A":

I czfIczc Ih Iu Aqlvh hIxz hl uhAih vd xt lkp qrly. lpz
hjIpy hjAh I jAaz plhIfzc Ip xt hIxz qilkuIpy hjz qrlyludjziz Iu hjAh hjziz Aiz
A kjlrz rlh lm hziiImIf ufIzpfz qrlyu lvh hjziz: liAf, djAitpyvrA, AzhIlrlyt,
evuh hl pAxz A mzk. qvh hjziz Iu plh pzAirt ul xvfj lvh hjziz czcIfAhzc hl xAhj
- Apc Ip dAihIfvrAi hl hjz xIuvuz lm xAhj. I hjIpw hjAh hjAh Iu A cAxp ujAxz,
qzfAvuz Ip xt zodziIzpfz, lpz lm hjz xluh frzAi kAtu lm IczphImtIpy A fiAfwdlh
Iu hjilvyj xAhj. pl xAhhzi hjz udzfImIf uvqezfh, hjz fiAfwdlhu ArkAtu zIhjzi
AalIc li ufizk vd hjz xAhj. kjzhjzi Ih'u hjz "xzifvit fAvuzu AvhIux" mlrwu, hjz
azrIwlauwIApu, fizAhIlpIuhu, Arh-xzcIfIpz svAfwu, izdvqrIfAp dlrruhziu, li
ufIzphlrlyIuhu - tlv fAp ArkAtu izflypInz hjz fiAfwdlhu qt hjzIi xAhj. ul I Ax
ylIpy hl cl xt qzuh hl dilaIcz A alIfz lm xAhjzxAhIfAr uApIht - qlhj qt ujlkIpy
kjAh'u kilpy kIhj hjz qAc xAhj urld dvxdzc lvh qt hjz rllpIzu, Apc qt ujlkIpy
jlk yllc xAhj kliwu.

Now, in the first sentence, we see two two-letter words side by side starting with "I". There are only three common two-letter words starting with "I": "if", "is", and "it". Since it's a pair of words, we can consider the
different permutations: "it is", "it if", "if it", "if is", "is if", and "is it". "it if" and "if is" are both very unlikely. We can't discard them out
of hand, but we can consider them very unlikely. "is it" would usually imply
a question, but there's no question mark. So it's probably either "if it", or "it is". Looking into the next line, where we see "hjAh", we can say that "h/F" is very unlikely, because there are very few words in english that start and end with an "f", but a lot that start and end with a T. So we'll guess that the phrase is "it is", and "h/T" and "u/S".

I czfIczc IT IS AqlvT TIxz Tl STAiT vd xt lkp qrly. lpz
TjIpy TjAT I jAaz plTIfzc Ip xt TIxz qilkSIpy Tjz qrlylSdjziz IS TjAT Tjziz Aiz
A kjlrz rlT lm TziiImIf SfIzpfz qrlyS lvT Tjziz: liAf, djAitpyvrA, AzTIlrlyt,
evST Tl pAxz A mzk. qvT Tjziz IS plT pzAirt Sl xvfj lvT Tjziz czcIfATzc Tl xATj
- Apc Ip dAiTIfvrAi Tl Tjz xISvSz lm xATj. I TjIpw TjAT TjAT IS A cAxp SjAxz,
qzfAvSz Ip xt zodziIzpfz, lpz lm Tjz xlST frzAi kAtS lm IczpTImtIpy A fiAfwdlT
IS Tjilvyj xATj. pl xATTzi Tjz SdzfImIf SvqezfT, Tjz fiAfwdlTS ArkAtS zITjzi
AalIc li Sfizk vd Tjz xATj. kjzTjzi IT'S Tjz "xzifvit fAvSzS AvTISx" mlrwS, Tjz
azrIwlaSwIApS, fizATIlpISTS, ArT-xzcIfIpz svAfwS, izdvqrIfAp dlrrSTziS, li
SfIzpTlrlyISTS - tlv fAp ArkAtS izflypInz Tjz fiAfwdlTS qt TjzIi xATj. Sl I Ax
ylIpy Tl cl xt qzST Tl dilaIcz A alIfz lm xATjzxATIfAr SApITt - qlTj qt SjlkIpy
kjAT'S kilpy kITj Tjz qAc xATj Srld dvxdzc lvT qt Tjz rllpIzS, Apc qt SjlkIpy
jlk yllc xATj kliwS.

Now, we see "Tl" in the first line. There's only one common two-letter
word starting with "T": "to". So we'll do "l/O". We also see "STAiT",
which looks like it's probably "start"; and we see that "i" occurs very frequently in the cipher-text; "R" is one of the most common consonants in english. So we'll also add "i/R".

I czfIczc IT IS AqOvT TIxz TO START vd xt Okp qrOy. Opz
TjIpy TjAT I jAaz pOTIfzc Ip xt TIxz qROkSIpy Tjz qrOyOSdjzRz IS TjAT TjzRz ARz
A kjOrz rOT Om TzRRImIf SfIzpfz qrOyS OvT TjzRz: ORAf, djARtpyvrA, AzTIOrOyt,
evST TO pAxz A mzk. qvT TjzRz IS pOT pzARrt SO xvfj OvT TjzRz czcIfATzc TO xATj
- Apc Ip dARTIfvrAR TO Tjz xISvSz Om xATj. I TjIpw TjAT TjAT IS A cAxp SjAxz,
qzfAvSz Ip xt zodzRIzpfz, Opz Om Tjz xOST frzAR kAtS Om IczpTImtIpy A fRAfwdOT
IS TjROvyj xATj. pO xATTzR Tjz SdzfImIf SvqezfT, Tjz fRAfwdOTS ArkAtS zITjzR
AaOIc OR SfRzk vd Tjz xATj. kjzTjzR IT'S Tjz "xzRfvRt fAvSzS AvTISx" mOrwS, Tjz
azrIwOaSwIApS, fRzATIOpISTS, ArT-xzcIfIpz svAfwS, RzdvqrIfAp dOrrSTzRS, OR
SfIzpTOrOyISTS - tOv fAp ArkAtS RzfOypInz Tjz fRAfwdOTS qt TjzIR xATj. SO I Ax
yOIpy TO cO xt qzST TO dROaIcz A aOIfz Om xATjzxATIfAr SApITt - qOTj qt SjOkIpy
kjAT'S kROpy kITj Tjz qAc xATj SrOd dvxdzc OvT qt Tjz rOOpIzS, Apc qt SjOkIpy
jOk yOOc xATj kORwS.

We're starting to see some really good progress. Some short words are
starting to pop up, and lots of places where the correct word is pretty obvious.
For example, "TjAT" is clearly "THAT", so we know "j/H". And we see lots of
three letter words, starting with "T". The most common three-letter word
starting with "T" is "the", and we see lots of three-letter words with "T"
followed by "j"; if our guess that "j" encodes "H" is correct, that means that
"z" almost certainly encodes "E".

 I cEfIcEc IT IS AqOvT TIxE TO START vd xt Okp qrOy. OpE THIpy THAT I HAaE
pOTIfEc Ip xt TIxE qROkSIpy THE qrOyOSdHERE IS THAT THERE ARE A kHOrE rOT Om
TERRImIf SfIEpfE qrOyS OvT THERE: ORAf, dHARtpyvrA, AETIOrOyt, evST TO pAxE A
mEk. qvT THERE IS pOT pEARrt SO xvfH OvT THERE cEcIfATEc TO xATH - Apc Ip
dARTIfvrAR TO THE xISvSE Om xATH. I THIpw THAT THAT IS A cAxp SHAxE, qEfAvSE Ip
xt EodERIEpfE, OpE Om THE xOST frEAR kAtS Om IcEpTImtIpy A fRAfwdOT IS THROvyH
xATH. pO xATTER THE SdEfImIf SvqeEfT, THE fRAfwdOTS ArkAtS EITHER AaOIc OR SfREk
vd THE xATH. kHETHER IT'S THE "xERfvRt fAvSES AvTISx" mOrwS, THE aErIwOaSwIApS,
fREATIOpISTS, ArT-xEcIfIpE svAfwS, REdvqrIfAp dOrrSTERS, OR SfIEpTOrOyISTS - tOv
fAp ArkAtS REfOypInE THE fRAfwdOTS qt THEIR xATH. SO I Ax yOIpy TO cO xt qEST TO
dROaIcE A aOIfE Om xATHExATIfAr SApITt - qOTH qt SHOkIpy kHAT'S kROpy kITH THE
qAc xATH SrOd dvxdEc OvT qt THE rOOpIES, Apc qt SHOkIpy HOk yOOc xATH kORwS.

Looking very good. We can see from things like "OvT" that "v" is cipher for "U"; from "HAaE", we can guess that "a" is cipher for "V"; from "OpE", we can guess that "p" is cipher for "N".

I cEfIcEc IT IS AqOUT TIxE TO START Ud xt OkN qrOy. ONE
THINy THAT I HAVE NOTIfEc IN xt TIxE qROkSINy THE qrOyOSdHERE IS THAT THERE ARE
A kHOrE rOT Om TERRImIf SfIENfE qrOyS OUT THERE: ORAf, dHARtNyUrA, AETIOrOyt,
eUST TO NAxE A mEk. qUT THERE IS NOT NEARrt SO xUfH OUT THERE cEcIfATEc TO xATH
- ANc IN dARTIfUrAR TO THE xISUSE Om xATH. I THINw THAT THAT IS A cAxN SHAxE,
qEfAUSE IN xt EodERIENfE, ONE Om THE xOST frEAR kAtS Om IcENTImtINy A fRAfwdOT
IS THROUyH xATH. NO xATTER THE SdEfImIf SUqeEfT, THE fRAfwdOTS ArkAtS EITHER
AVOIc OR SfREk Ud THE xATH. kHETHER IT'S THE "xERfURt fAUSES AUTISx" mOrwS, THE
VErIwOVSwIANS, fREATIONISTS, ArT-xEcIfINE sUAfwS, REdUqrIfAN dOrrSTERS, OR
SfIENTOrOyISTS - tOU fAN ArkAtS REfOyNInE THE fRAfwdOTS qt THEIR xATH. SO I Ax
yOINy TO cO xt qEST TO dROVIcE A VOIfE Om xATHExATIfAr SANITt - qOTH qt SHOkINy
kHAT'S kRONy kITH THE qAc xATH SrOd dUxdEc OUT qt THE rOONIES, ANc qt SHOkINy
HOk yOOc xATH kORwS.

From "SfIENfE", we can guess that "f" is cipher for "C";
from "xATH", "TIxE", and "NAxE" we can guess that "x" is cipher for M.
We can see "NEARrt", which should be either "NEARBY" or "NEARLY"; L occurs
much more frequently than B, and some of the other instances of "r" appear likely to be "L"s.

I cECIcEc IT IS AqOUT TIME TO START Ud MY OkN qLOy. ONE
THINy THAT I HAVE NOTICEc IN MY TIME qROkSINy THE qLOyOSdHERE IS THAT THERE ARE
A kHOLE LOT Om TERRImIC SCIENCE qLOyS OUT THERE: ORAC, dHARYNyULA, AETIOLOyY,
eUST TO NAME A mEk. qUT THERE IS NOT NEARLY SO MUCH OUT THERE cEcICATEc TO MATH
- ANc IN dARTICULAR TO THE MISUSE Om MATH. I THINw THAT THAT IS A cAMN SHAME,
qECAUSE IN MY EodERIENCE, ONE Om THE MOST CLEAR kAYS Om IcENTImYINy A CRACwdOT
IS THROUyH MATH. NO MATTER THE SdECImIC SUqeECT, THE CRACwdOTS ALkAYS EITHER
AVOIc OR SCREk Ud THE MATH. kHETHER IT'S THE "MERCURY CAUSES AUTISM" mOLwS, THE
VELIwOVSwIANS, CREATIONISTS, ALT-MEcICINE sUACwS, REdUqLICAN dOLLSTERS, OR
SCIENTOLOyISTS - YOU CAN ALkAYS RECOyNInE THE CRACwdOTS qY THEIR MATH. SO I AM
yOINy TO cO MY qEST TO dROVIcE A VOICE Om MATHEMATICAL SANITY - qOTH qY SHOkINy
kHAT'S kRONy kITH THE qAc MATH SLOd dUMdEc OUT qY THE LOONIES, ANc qY SHOkINy
HOk yOOc MATH kORwS.

You should be able to finish it from here: there are a ton of obvious
substittions now: "c/D", "q/B", "y/G", "d/P", "m/F", etc. You should also
now really understand why simple substition is such a poor encryption technique: look how easily we just cracked that! In fact, there's a
cryptanalysis proof that on average, you need less that 50 characters of
cipher-text to decode a simple substitution for english.

(The clear-text is the text of the first-ever post on the original version
of this blog, with contractions and paragraph breaks removed.)

Tags

More like this

The second major family of encryption techniques is called transposition ciphers. I find transposition ciphers to be rather dull; in their pure form, they're very simple, and not very difficult to crack, even without computers. But some of the most sophisticated modern ciphers can be looked at as…
Where encryption starts getting really interesting, in my opinion, is block ciphers. Block ciphers are a general category of ciphers that are sort of a combination of substitution and transposition ciphers, and sort of something entirely different. They're really fascinating things, but they're…
Sorry for the slow pace of the blog lately. I've been sick with a horrible sinus infection for the last month, and I've also been particularly busy with work, which have left me with neither the time nor the energy to do the research necessary to put together a decent blog post. After seeing an…
Now, we're finally reaching the point where the block-cipher stuff gets really fun: block cryptanalysis. As I've explained before, the key properties of a really good encryption system are: It's easy to compute the ciphertext given the plaintext and the key; It's easy to compute the plaintext…

For what it's worth, I would have reversed the cases, raising it all to uppercase and then lowering as letters are figured out. All uppercase -- SCREAMINGSTREAMING -- is harder for the eye to pick up, so putting the signal in lowercase and the noise in uppercase is easier on the eye and the brain.

Another point that shows how poor a simple substitution encryption technique is that newspapers regularly publish cryptoquotes which are essentially the same problem you just solved here.

Keep the cryptography posts coming; I really enjoy them. I pasted the ciphertext into a Java-based cryptogram helper which presented the cleartext almost immediately. I think it helps illustrate how fortunate we are to have computers perform the drudgery for us once we're aware of the method to use.

A good cryptosystem is one where (1) it's easy to generate the cipher-text given the clear-text and the secret key; (2) easy to generate the clear-text given the cipher-text and the secret key; (3) and very hard to figure out the secret key, even given a large number of clear-text/cipher-text pairs. (numbers added)

That's a deficient definition of a good cryptosystem, as En(P, K) = P; De(C, K) = C would qualify. Clearly a do-nothing function is easy, so (1) and (2) are satisfied; and since both P and C are completely independent of K, no amount of C/P pairs can reveal K, satisfying (3).

My 2 cents on this:
Any encryption is based on trying to remove meaning from the plain text. Any form of cryptanalysis is the attempt at finding meaning in an encrypted text.
From this simple substitution cypher and it's decryption we should see ever more ingenious ways of removing meaning and finding out if there is meaning.
But what really counts with this is trying to run out the clock on the person performing the cryptanalysis. The most secure encryption versions are also the ones requiring the most resources (and even more when doing cryptanalysis on it) but if something only has to be secure for a few hours a simpler encryption can be used.

@Anthony:
Then just add a restraint that K â  {Ã}

By Who Cares (not verified) on 15 Aug 2008 #permalink

Exercise for the mathematically-inclined: Did you need to be told that the plaintext was English, or could you have deduced that yourself before decrypting?

By Pseudonym (not verified) on 15 Aug 2008 #permalink

Pseudonym, one way of distinguishing the language is to calculate the index of coincidence (See http://en.wikipedia.org/wiki/Index_of_coincidence). This varies somewhat between languages. Other methods might involve counting the relative proportions of words of different lengths or simply comparing the frequency profile of the cipher alphabet with the typical distributions of different languages.

Very nice posts so far! Very appropriate as gentle introductory material. While discussing "classical crypto" it might be nice to begin to discuss in-depth what the "goals" of the attacker might be (thus setting up definitions of what "secure" means in various contexts). It feels like you are already making motions in this direction, with the "attack scenarios"/models. Eventually, showing how "semantic" security is equivalent to "indistinguishability" based definitions (CPA) might make a nice post.

on the same topic, in regard to this comment:
"...Any form of cryptanalysis is the attempt at finding meaning in an encrypted text..."

I think breaking the security of "non-malleable" systems (by showing them to be malleable) is also cryptanaylsis, but does not meet this informal criterion. I mean, given a ciphertext C, I may never know what number it represents .. but if I can transform it into another ciphertext C' such that the plaintexts for C' is 100 more than the plaintext for C, I may be able to, say, steal an auction. Maybe Mark will talk about non-malleability someday (*hopes*).

Cryptanalysis (in my mind) is any effort to break the security of a crypto-system. The security is defined in relationship to a goal and an attack model. This post gives us attack models, and I'm sure Mark will give us some goals in a later post (which is appropriate, since its a big topic).

I helped run a programme for high school students and this is one of the problems we used. Getting kids who've never even though about encryption before to crack the cipher text in not much more time than it takes to read it in the clear really makes it clear just how insecure substitution ciphers are.

Getting them to do one time pads by hand is slightly more challenging (with transcription errors and the like) but demonstrates some of that technique's strengths and weaknesses just as readily.

I'd also agree that the definition of a "good cryptosystem" does leave out what is perhaps the most important clause: it should be easy to encrypt and decrypt with the key and difficult, if not impossible, otherwise.

I'm looking forward to reading more on this topic.

bgc: Excellent answer. The IC, in particular, is one of the most powerful tools that you have to break classical ciphers. It seems counter-intuitive, but given only a ciphertext encrypted with any combination of transposition and simple substitution, you can tell what language the plaintext is written in even if you can't decode it yet.
(Similarly, for polyalphabetic ciphers, such as rotor ciphers, you can tell if two messages were encrypted with the same session key, as well as what language the plaintext is written in, from the IC alone.)

By Pseudonym (not verified) on 16 Aug 2008 #permalink

wF3AzJQ/zJSdxNiIkYkcDAgJGcVJcXABMAnISVlt
J0AJCI3A+EJOZMzW6YGN+ryJhFDLUgF0qABUxcVQ
QRVMhB1XR4FQb41czNmLpF29tFGevE2YuUQah1WY
49yLErDc0RHaA5FJAFFLf1iEgUrLYVDFmQiZCHjM
0gSZRRE2mgFYjhxURpJLiYRVZFkYyJ1IGVjOFVi1
RdiZzoCGh5zarU1WjlUU+twB

By Challenger (not verified) on 17 Aug 2008 #permalink

/*
This program implements the algorithm given by George
W. Hart in the September 1994 edition of the Communications
of the ACM (Volume 37, Number 9).

Edgar Allan Poe used this method to crack simple
substitution ciphers around 1839; consider the following
from pages 784-785 of David Kahn's The Codebreakers:

The closest he ever came to doing so was
when he demonstrated how he deduced that
a challenge sent him by G. W. Kulp, of
Lewiston, Pennsylvania, was a false
cryptogram. He picked out three words
in the cryptogram--MW, LAAM, and MLW.
Since "all English words of but two
letters consist of a vowel and a
consonant," he wrote, MW must be one of
30 words, which he listed. He then
inserted every letter of the alphabet in
the middle of all 30 words in an
exhaustive trial process to see which
letters would make a sensible word out
of MLW. Here he found 18, including ash
and tho'. Turning to LAAM, he observed,
that "if MLW be ash, then LAAM will be a
word of this form, s..a, in which the
dots represent two unknown letters of
the same kind." He ran through his 18
words in this way, and found that the
only one that gave a possible meaning
for LAAM was h..t, or hoot. "LAAM is
then hoot or nothing. But the hypothesis
of the word hoot is founded upon that of
the word tho'.... We now arrive at a
definite conclusion. Either Mr. Kulp's
puzzle is not genuine, or MW stands for
to, MLW for tho', and LAAM for hoot.
But it is evident that this latter
cannot be--for in that case both W and A
represent the letter 0. What
follows?--why that Mr. Kulp's puzzle is
no puzzle at all. This demonstration is
as absolutely conclusive as any
mathematical one could be. The process
of reasoning here employed is that
employed also in the solution of the
cyphers."
*/

#include
#include
#include
#include
#include

#define TRUE 1
#define FALSE 0

#define ALPHABET_SIZE 27 /* A-Z '; > 26 */

typedef struct PlainTextNode
{
char *szPlainText;
struct PlainTextNode *pNext;
} *PlainTextNodePtr;

typedef struct PatternNode
{
int nDistinctLetters;
char *szPattern;
unsigned long ulPlainTexts;
PlainTextNodePtr pPlainTextHead;
struct PatternNode *pLesser;
struct PatternNode *pGreater;
} *PatternNodePtr;

typedef struct SortedNode
{
char *szCipherText;
PatternNodePtr pPattern;
PlainTextNodePtr pPlainText;
struct SortedNode *pPrevious;
struct SortedNode *pNext;
} *SortedNodePtr;

typedef struct EncipheredNode
{
char *szCipherText;
PatternNodePtr pPattern;
SortedNodePtr pSorted;
struct EncipheredNode *pNext;
} *EncipheredNodePtr;

static PatternNodePtr AddEncipheredWord(char *,
EncipheredNodePtr *,
EncipheredNodePtr *,
PatternNodePtr *,int *);
static void AddPlainText(PatternNodePtr,char *,
char *,int *);
static void AddWordNotInDictionary(
PatternNodePtr,int *);
static void FreePattern(PatternNodePtr);
static void GetPatternForPlainTextWord(char *,
char *,int *);
int main(int,char **);
static int MyStricmp(char *,char *);
static void OutputSolution(EncipheredNodePtr,
char *,char *,unsigned long *,
char *,int *,int *);
static PatternNodePtr PatternForEncipheredWord(char *,
PatternNodePtr *,int *);
static char *PatternForWord(char *,int *,int *);
static void Solve(int,EncipheredNodePtr,
SortedNodePtr,char *,char *,int,
int *);

/*
Replace the following word list with your own or modify
the program to read the list from a file.
*/
static char *pszWord [] =
{
"a","able","about","above","according",
"account","across","act","action",
"activities","activity","actually",
"added","addition","administration",
"after","again","against","age","ago",
"ahead","aid","air","all","almost",
"alone","along","already","also",
"although","always","am","america",
"american","among","amount","an",
"analysis","and","another","answer",
"anti","any","anyone","anything",
"apparently","appear","appeared",
"approach","are","area","areas","army",
"around","art","as","ask","asked",
"association","at","attack","attention",
"audience","available","average","away",
"b","back","bad","ball","based",
"basic","basis","be","beautiful",
"became","because","become","bed",
"been","before","began","beginning",
"behind","being","believe","below",
"best","better","between","beyond",
"big","bill","black","blood","blue",
"board","body","book","born","both",
"boy","boys","bring","brought","brown",
"building","built","business","but",
"by","c","call","called","came","can",
"cannot","car","care","carried","cars",
"case","cases","cause","cent","center",
"central","century","certain",
"certainly","change","character",
"chief","child","children","choice",
"christian","church","city","class",
"clear","clearly","close","closed",
"club","co","cold","college","color",
"come","comes","coming","committee",
"common","communist","community",
"company","complete","completely",
"concerned","conditions","congress",
"consider","considered","continued",
"control","corner","corps","cost",
"costs","could","couldn\'t","countries",
"country","county","couple","course",
"court","covered","cut","d","daily",
"dark","data","day","days","de",
"dead","deal","death","decided",
"decision","deep","defense","degree",
"democratic","department","described",
"design","designed","determined",
"developed","development","did",
"didn\'t","difference","different",
"difficult","direct","direction",
"directly","distance","district","do",
"does","doing","done","don\'t","door",
"doubt","down","dr","drive","due",
"during","e","each","earlier","early",
"earth","east","economic","education",
"effect","effective","effects","effort",
"efforts","eight","either","elements",
"else","end","england","english",
"enough","entire","equipment",
"especially","established","europe",
"even","evening","ever","every",
"everything","evidence","example",
"except","existence","expect",
"expected","experience","extent","eye",
"eyes","f","face","fact","faith",
"fall","family","far","farm","father",
"fear","federal","feed","feel",
"feeling","feet","felt","few","field",
"figure","figures","filled","final",
"finally","find","fine","fire","firm",
"first","fiscal","five","floor",
"followed","following","food","foot",
"for","force","forces","foreign",
"form","former","forms","forward",
"found","four","free","freedom",
"french","friend","friends","from",
"front","full","function","further",
"future","g","game","gave","general",
"generally","george","get","getting",
"girl","girls","give","given","gives",
"glass","go","god","going","gone",
"good","got","government","great",
"greater","green","ground","group",
"groups","growing","growth","gun","h",
"had","hair","half","hall","hand",
"hands","happened","hard","has","have",
"having","he","head","hear","heard",
"heart","heavy","held","hell","help",
"her","here","herself","he\'s","high",
"higher","him","himself","his",
"history","hit","hold","home","hope",
"horse","hospital","hot","hotel",
"hour","hours","house","how","however",
"human","hundred","husband","i","idea",
"ideas","if","i\'ll","i\'m","image",
"immediately","important","in",
"include","including","income",
"increase","increased","indeed",
"individual","industrial","industry",
"influence","information","inside",
"instead","interest","international",
"into","involved","is","island",
"issue","it","its","it\'s","itself",
"i\'ve","j","job","john","just",
"justice","keep","kennedy","kept",
"kind","knew","know","knowledge",
"known","l","labor","lack","land",
"language","large","larger","last",
"late","later","latter","law","lay",
"lead","leaders","learned","least",
"leave","led","left","length","less",
"let","letter","letters","level",
"life","light","like","line","lines",
"list","literature","little","live",
"lived","living","local","long",
"longer","look","looked","looking",
"lost","lot","love","low","lower","m",
"made","main","major","make","makes",
"making","man","manner","mans","many",
"march","market","married","mass",
"material","matter","may","maybe","me",
"mean","meaning","means","medical",
"meet","meeting","member","members",
"men","merely","met","method",
"methods","middle","might","miles",
"military","million","mind","minutes",
"miss","modern","moment","money",
"month","months","moral","more",
"morning","most","mother","move",
"moved","movement","moving","mr","mrs",
"much","music","must","my","myself",
"n","name","nation","national",
"nations","natural","nature","near",
"nearly","necessary","need","needed",
"needs","negro","neither","never",
"new","next","night","no","non","nor",
"normal","north","not","note",
"nothing","now","nuclear","number",
"numbers","obtained","obviously","of",
"off","office","often","oh","old",
"on","once","one","ones","one\'s",
"only","open","opened","operation",
"opportunity","or","order",
"organization","other","others","our",
"out","outside","over","own","p",
"paid","paper","part","particular",
"particularly","parts","party","passed",
"past","pattern","pay","peace",
"people","per","performance","perhaps",
"period","person","personal","persons",
"physical","picture","piece","place",
"plan","plane","planning","plans",
"plant","play","point","points",
"police","policy","political","pool",
"poor","population","position",
"possible","post","power","present",
"president","press","pressure","price",
"principle","private","probably",
"problem","problems","process",
"production","products","program",
"programs","progress","property",
"provide","provided","public","purpose",
"put","quality","question","questions",
"quite","r","race","radio","ran",
"range","rate","rather","reached",
"reaction","read","reading","ready",
"real","really","reason","received",
"recent","recently","record","red",
"religion","religious","remember",
"report","reported","required",
"research","respect","responsibility",
"rest","result","results","return",
"returned","right","river","road",
"room","run","running","s","said",
"sales","same","sat","saw","say",
"saying","says","school","schools",
"science","season","second","secretary",
"section","see","seem","seemed",
"seems","seen","self","sense","sent",
"series","serious","served","service",
"services","set","seven","several",
"shall","she","short","shot","should",
"show","showed","shown","side",
"similar","simple","simply","since",
"single","situation","six","size",
"slowly","small","so","social",
"society","some","something",
"sometimes","somewhat","son","soon",
"sort","sound","south","southern",
"soviet","space","speak","special",
"specific","spirit","spring","square",
"st","staff","stage","stand",
"standard","start","started","state",
"statements","states","stay","step",
"steps","still","stock","stood","stop",
"stopped","story","straight","street",
"strength","strong","student",
"students","study","subject","such",
"suddenly","summer","sun","support",
"sure","surface","system","t","table",
"take","taken","taking","talk","tax",
"technical","tell","temperature","ten",
"term","terms","test","th","than",
"that","thats","the","their","them",
"themselves","then","theory","there",
"therefore","there\'s","these","they",
"thing","things","think","thinking",
"third","thirty","this","those",
"thought","three","through",
"throughout","thus","time","times",
"to","today","together","told","too",
"took","top","total","toward","town",
"trade","training","trial","tried",
"trouble","true","truth","try",
"trying","turn","turned","twenty",
"two","type","types","u","unlikely",
"under","understand","union","united",
"university","until","up","upon","us",
"use","used","using","usually","value",
"values","various","very","view",
"voice","volume","waiting","walked",
"wall","want","wanted","war","was",
"washington","wasn\'t","water","way",
"ways","we","week","weeks","well",
"went","were","west","western","what",
"whatever","when","where","whether",
"which","while","white","who","whole",
"whom","whose","why","wide","wife",
"will","william","window","wish",
"with","within","without","woman",
"women","word","words","work","worked",
"working","works","world","would",
"wouldn\'t","written","wrong","wrote",
"year","years","yes","yet","york",
"you","young","your","you\'re",NULL
};

int main(
int nArgs,
char **pszArg)
{
int bAlphabetic;
int bDuplicateWord;
int bErr;
int bKeepOnlyTheBest;
int bLetterUsed [26];
double dDistinctLettersPerWord;
double dMaxDistinctLettersPerWord;
FILE *fileCipherText;
int nCurrentChar;
int nDistinctWords;
int nLetter;
int nMatchingLetters;
int nMaxMatchingLetters;
int nWord;
int nWordLen;
char *pc;
EncipheredNodePtr pEncipheredHead;
EncipheredNodePtr pEnciphered1;
EncipheredNodePtr pEnciphered2;
EncipheredNodePtr pEncipheredForMax;
EncipheredNodePtr pEncipheredTail;
PatternNodePtr pPattern;
PatternNodePtr pPatternHead;
SortedNodePtr pSorted;
SortedNodePtr pSortedHead;
SortedNodePtr pSortedTail;
char szPattern [256];
char szWord [256];

bErr=FALSE;
if ((nArgs == 3) || (nArgs == 4))
{
if (nArgs == 4)
if (strcmp(*(pszArg+3),"1"))
bKeepOnlyTheBest=FALSE;
else
bKeepOnlyTheBest=TRUE;
else
bKeepOnlyTheBest=TRUE;
if (fileCipherText=fopen(*(pszArg+1),"r"))
{
remove(*(pszArg+2));
printf(
"Getting patterns in enciphered text...\n");
nDistinctWords=0;
pEncipheredHead=NULL;
pEncipheredTail=NULL;
pPatternHead=NULL;
do
{
while (((nCurrentChar
=fgetc(fileCipherText)) != EOF)
&& (nCurrentChar != (int) '\'')
&& (! islower(nCurrentChar
=tolower(nCurrentChar))));
if (nCurrentChar != EOF)
{
szWord[0]='\0';
nWordLen=0;
do
szWord[nWordLen++]
=(char) nCurrentChar;
while (((nCurrentChar
=fgetc(fileCipherText))
!= EOF)
&& (nWordLen < 255)
&& ((nCurrentChar == '\'')
|| (islower(nCurrentChar
=tolower(nCurrentChar)))));
szWord[nWordLen]='\0';
pPattern=AddEncipheredWord(&szWord[0],
&pEncipheredHead,&pEncipheredTail,
&pPatternHead,&bErr);
if ((! bErr)
&& (pPattern->pPlainTextHead == NULL))
AddWordNotInDictionary(pPattern,
&bErr);
}
}
while ((! bErr) && (nCurrentChar != EOF));
fclose(fileCipherText);
if (! bErr)
{
if (pPatternHead == NULL)
{
bErr=TRUE;
printf("Fatal error: \"%s\" contains "
"no words.\n",*(pszArg+1));
}
}
if (! bErr)
{
printf("Getting words matching the "
"patterns...\n");
nWord=0;
while (pszWord[nWord])
{
GetPatternForPlainTextWord(
pszWord[nWord],&szPattern[0],
&bAlphabetic);
if (bAlphabetic)
AddPlainText(pPatternHead,
&szPattern[0],pszWord[nWord],&bErr);
++nWord;
}
if (! bErr)
{
dMaxDistinctLettersPerWord=(double) 0;
pEnciphered1=pEncipheredHead;
while (pEnciphered1)
{
if (pEnciphered1->pPattern->
ulPlainTexts)
{
dDistinctLettersPerWord
=((double)
(pEnciphered1->pPattern->
nDistinctLetters))
/((double)
(pEnciphered1->pPattern->
ulPlainTexts));
if (dDistinctLettersPerWord
> dMaxDistinctLettersPerWord)
{
dMaxDistinctLettersPerWord
=dDistinctLettersPerWord;
pEncipheredForMax
=pEnciphered1;
}
}
pEnciphered1=pEnciphered1->pNext;
}
if (dMaxDistinctLettersPerWord
<= (double) 0)
{
bErr=TRUE;
printf("Fatal error: no word "
"matched any pattern in "
"\"%s\".\n",*(pszArg+1));
}
else
if (pSortedHead=(SortedNodePtr)
malloc(sizeof(struct SortedNode)))
{
pEncipheredForMax->pSorted
=pSortedHead;
pSortedHead->szCipherText
=pEncipheredForMax->
szCipherText;
pSortedHead->pPattern
=pEncipheredForMax->pPattern;
pSortedHead->pPlainText
=pEncipheredForMax->pPattern->
pPlainTextHead;
pSortedHead->pPrevious=NULL;
pSortedHead->pNext=NULL;
pSortedTail=pSortedHead;
for (nLetter=26; nLetter--;)
bLetterUsed[nLetter]=FALSE;
pc=pEncipheredForMax->
szCipherText;
while (*pc)
{
if (*pc != '\'')
bLetterUsed[tolower(
(int) *pc)-(int) 'a']
=TRUE;
++pc;
}
pEnciphered1
=pEncipheredHead->pNext;
while ((! bErr)
&& (pEnciphered1))
{
nMaxMatchingLetters=0;
pEncipheredForMax=NULL;
pEnciphered2=pEncipheredHead;
while (pEnciphered2)
{
if (pEnciphered2->pSorted
== NULL)
{
nMatchingLetters=0;
pc=pEnciphered2->
szCipherText;
while (*pc)
{
if ((*pc != '\'')
&& (bLetterUsed[
tolower(
(int) *pc)
-(int)
'a']))
++nMatchingLetters;
++pc;
}
if (nMatchingLetters
>=
nMaxMatchingLetters)
{
nMaxMatchingLetters
=nMatchingLetters;
pEncipheredForMax
=pEnciphered2;
}
}
pEnciphered2
=pEnciphered2->pNext;
}
bDuplicateWord=FALSE;
pEnciphered2=pEncipheredHead;
while ((! bDuplicateWord)
&& (pEnciphered2))
if (pEnciphered2->pSorted)
if (MyStricmp(
pEnciphered2->
szCipherText,
pEncipheredForMax->
szCipherText))
pEnciphered2
=pEnciphered2->pNext;
else
bDuplicateWord=TRUE;
else
pEnciphered2
=pEnciphered2->pNext;
if (bDuplicateWord)
pEncipheredForMax->pSorted
=pEnciphered2->pSorted;
else
if (pSorted=(SortedNodePtr)
malloc(sizeof(
struct SortedNode)))
{
++nDistinctWords;
pEncipheredForMax->
pSorted=pSorted;
pSorted->szCipherText
=pEncipheredForMax->
szCipherText;
pSorted->pPattern
=pEncipheredForMax->
pPattern;
pSorted->pPlainText
=pEncipheredForMax->
pPattern->
pPlainTextHead;
pSortedTail->pNext
=pSorted;
pSorted->pPrevious
=pSortedTail;
pSorted->pNext=NULL;
pSortedTail=pSorted;
pc=pEncipheredForMax->
szCipherText;
while (*pc)
{
if (*pc != '\'')
bLetterUsed[
tolower(
(int) *pc)
-(int) 'a']=TRUE;
++pc;
}
}
else
{
bErr=TRUE;
printf("Fatal error: "
"out of memory\n");
}
pEnciphered1
=pEnciphered1->pNext;
}
if (! bErr)
{
printf("Solving...\n");
Solve(nDistinctWords,
pEncipheredHead,pSortedHead,
*(pszArg+1),*(pszArg+2),
bKeepOnlyTheBest,&bErr);
}
while (pSortedHead)
{
pSorted=pSortedHead->pNext;
free((void *) pSortedHead);
pSortedHead=pSorted;
}
}
else
{
bErr=TRUE;
printf("Fatal error: out of "
"memory\n");
}
}
}
FreePattern(pPatternHead);
while (pEncipheredHead)
{
pEnciphered1=pEncipheredHead->pNext;
free((void *)
(pEncipheredHead->szCipherText));
free((void *) pEncipheredHead);
pEncipheredHead=pEnciphered1;
}
}
else
{
bErr=TRUE;
printf("Fatal error: \"%s\" cannot be opened "
"for input.",*(pszArg+1));
}
}
else
{
bErr=TRUE;
printf("%s solves simple substitution ciphers "
"with word divisions.\n\n"
" Usage: %s "
"[]\n\n"
" Only the solutions with the highest score "
"are kept when\n"
" is 1 (the default).\n\n"
"Example: %s cipher.txt plain.txt\n",
*pszArg,*pszArg,*pszArg);
}
return bErr;
}

static int MyStricmp(
char *sz1,
char *sz2)
{
register int result;

while ((! (result=(tolower(
(int) *sz1)-tolower((int) *sz2))))
&& *sz1++)
++sz2;
return result;
}

static void OutputSolution(
EncipheredNodePtr pEncipheredHead,
char *szIName,
char *szOName,
unsigned long *nPossibleSolutions,
char *szDecipherment,
int *nDeciphermentCount,
int *bErr)
{
FILE *fileCipherText;
FILE *filePlainText;
int nCurrentChar;
int nWordLen;
char *pc;
EncipheredNodePtr pEnciphered;

if (fileCipherText=fopen(szIName,"r"))
{
if (filePlainText=fopen(szOName,"a"))
{
printf("Writing possible solution %lu to "
"\"%s\"...\n",++(*nPossibleSolutions),
szOName);
pEnciphered=pEncipheredHead;
do
{
while ((! *bErr)
&& ((nCurrentChar
=fgetc(fileCipherText)) != EOF)
&& (nCurrentChar != (int) '\'')
&& (! islower(tolower(nCurrentChar))))
*bErr=(nCurrentChar
!= fputc(nCurrentChar,filePlainText));
if ((! *bErr) && (nCurrentChar != EOF))
{
nWordLen=0;
pc=pEnciphered->pSorted->pPlainText->
szPlainText;
if (*pc)
do
{
if (nCurrentChar == '\'')
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
else
if (islower(nCurrentChar))
{
nCurrentChar
=tolower((int) *pc);
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
}
else
{
nCurrentChar
=toupper((int) *pc);
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
}
++pc;
}
while ((! *bErr)
&& ((nCurrentChar
=fgetc(fileCipherText))
!= EOF)
&& (nWordLen < 255)
&& ((nCurrentChar == '\'')
|| (islower(tolower(
nCurrentChar)))));
else
do
{
if (nCurrentChar == '\'')
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
else
if (islower(nCurrentChar))
{
if (nDeciphermentCount[
nCurrentChar-(int) 'a'])
nCurrentChar=tolower(
(int) (szDecipherment[
nCurrentChar
-(int) 'a']));
else
nCurrentChar=(int) '*';
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
}
else
if (isupper(nCurrentChar))
{
if (nDeciphermentCount[
nCurrentChar-(int) 'A'])
nCurrentChar=toupper(
(int) (szDecipherment[
nCurrentChar
-(int) 'A']));
else
nCurrentChar=(int) '*';
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
}
else
{
nCurrentChar
=toupper((int) *pc);
*bErr=(nCurrentChar
!= fputc(nCurrentChar,
filePlainText));
}
}
while ((! *bErr)
&& ((nCurrentChar
=fgetc(fileCipherText)) != EOF)
&& (nWordLen < 255)
&& ((nCurrentChar == '\'')
|| (islower(
tolower(nCurrentChar)))));
if (! *bErr)
*bErr=(nCurrentChar != fputc(
nCurrentChar,filePlainText));
pEnciphered=pEnciphered->pNext;
}
}
while ((! *bErr) && (nCurrentChar != EOF));
if (! *bErr)
*bErr=((int) '\n' != fputc((int) '\n',
filePlainText));
if (*bErr)
printf("Fatal error: the file \"%s\" could "
"not be written.\n",szOName);
fclose(filePlainText);
}
else
{
*bErr=TRUE;
printf("Fatal error: \"%s\" cannot be opened "
"for output.",szOName);
}
fclose(fileCipherText);
}
else
{
*bErr=TRUE;
printf("Fatal error: \"%s\" cannot be opened for "
"input.",szIName);
}
return;
}

static void Solve(
int nDistinctWords,
EncipheredNodePtr pEncipheredHead,
SortedNodePtr pSortedHead,
char *szIName,
char *szOName,
int bKeepOnlyTheBest,
int *bErr)
{
int bBorrow;
int bContradiction;
char cDeciphered;
char cEnciphered;
int nCurrentChar;
int nDeciphermentCount [26];
int nEnciphermentCount [26];
int nScore;
int nScoreMax;
int nWord;
char *pc1;
char *pc2;
SortedNodePtr pSorted;
char szDecipherment [26];
char szEncipherment [26];
unsigned long ulPossibleSolutions;

ulPossibleSolutions=(unsigned long) 0;
for (nCurrentChar=26; nCurrentChar--;)
{
szDecipherment[nCurrentChar]='\0';
nDeciphermentCount[nCurrentChar]=0;
szEncipherment[nCurrentChar]='\0';
nEnciphermentCount[nCurrentChar]=0;
}
nScore=0;
nScoreMax=0;
pSorted=pSortedHead;
nWord=0;
while (pSorted)
{
if (*(pSorted->pPlainText->szPlainText))
++nScore;
++nWord;
pc1=pSorted->pPlainText->szPlainText;
bContradiction=FALSE;
if (*pc1) /* word in dictionary */
{
pc2=pSorted->szCipherText;
while ((! bContradiction) && (*pc2))
{
if (*pc1 != '\'')
{
cDeciphered=(char) tolower((int) *pc1);
cEnciphered=(char) tolower((int) *pc2);
if (nDeciphermentCount[
cEnciphered-'a'])
if (szDecipherment[cEnciphered-'a']
== cDeciphered)
++(nDeciphermentCount[
cEnciphered-'a']);
else
bContradiction=TRUE;
else
{
szDecipherment[cEnciphered-'a']
=cDeciphered;
nDeciphermentCount[cEnciphered-'a']
=1;
}
if (nEnciphermentCount[
cDeciphered-'a'])
if (szEncipherment[cDeciphered-'a']
== cEnciphered)
++(nEnciphermentCount[
cDeciphered-'a']);
else
bContradiction=TRUE;
else
{
szEncipherment[cDeciphered-'a']
=cEnciphered;
nEnciphermentCount[cDeciphered-'a']
=1;
}
}
++pc1;
++pc2;
}
}
else /* word not in dictionary */
{
if (nScore+nDistinctWords-nWord < nScoreMax)
bContradiction=TRUE;
}
if (bContradiction)
{
if (*(pSorted->pPlainText->szPlainText))
--nScore;
--nWord;
bBorrow=TRUE;
while ((bBorrow) && (pSorted))
{
pc1=pSorted->pPlainText->szPlainText;
if (*pc1)
{
bContradiction=FALSE;
pc2=pSorted->szCipherText;
while ((! bContradiction) && (*pc2))
{
if (*pc1 != '\'')
{
cDeciphered
=(char) tolower((int) *pc1);
cEnciphered
=(char) tolower((int) *pc2);
if (szDecipherment[
cEnciphered-'a']
== cDeciphered)
{
if (--(nDeciphermentCount[
cEnciphered-'a']) == 0)
szDecipherment[
cEnciphered-'a']='\0';
}
else
bContradiction=TRUE;
if (szEncipherment[
cDeciphered-'a']
== cEnciphered)
{
if (--(nEnciphermentCount[
cDeciphered-'a']) == 0)
szEncipherment[
cDeciphered-'a']='\0';
}
else
bContradiction=TRUE;
}
++pc1;
++pc2;
}
}
pSorted->pPlainText
=pSorted->pPlainText->pNext;
if (pSorted->pPlainText)
bBorrow=FALSE;
else
{
pSorted->pPlainText
=pSorted->pPattern->pPlainTextHead;
pSorted=pSorted->pPrevious;
if (pSorted)
{
if (*(pSorted->pPlainText->
szPlainText))
--nScore;
}
--nWord;
}
}
}
else
if (pSorted->pNext)
pSorted=pSorted->pNext;
else
{
if (nScore > nScoreMax)
{
nScoreMax=nScore;
if (bKeepOnlyTheBest)
{
remove(szOName);
printf("Previous possible solutions "
"have been deleted to make room "
"for better ones.\n");
ulPossibleSolutions
=(unsigned long) 0;
}
}
OutputSolution(pEncipheredHead,szIName,
szOName,&ulPossibleSolutions,
&szDecipherment[0],&nDeciphermentCount[0],
bErr);
if (*(pSorted->pPlainText->szPlainText))
--nScore;
--nWord;
bBorrow=TRUE;
while ((bBorrow) && (pSorted))
{
pc1=pSorted->pPlainText->szPlainText;
if (*pc1)
{
bContradiction=FALSE;
pc2=pSorted->szCipherText;
while ((! bContradiction) && (*pc2))
{
if (*pc1 != '\'')
{
cDeciphered
=(char) tolower((int) *pc1);
cEnciphered
=(char) tolower((int) *pc2);
if (szDecipherment[
cEnciphered-'a']
== cDeciphered)
{
if (--(nDeciphermentCount[
cEnciphered-'a']) == 0)
szDecipherment[
cEnciphered-'a']='\0';
}
else
bContradiction=TRUE;
if (szEncipherment[
cDeciphered-'a']
== cEnciphered)
{
if (--(nEnciphermentCount
[cDeciphered-'a']) == 0)
szEncipherment[
cDeciphered-'a']='\0';
}
else
bContradiction=TRUE;
}
++pc1;
++pc2;
}
}
pSorted->pPlainText
=pSorted->pPlainText->pNext;
if (pSorted->pPlainText)
bBorrow=FALSE;
else
{
pSorted->pPlainText
=pSorted->pPattern->pPlainTextHead;
pSorted=pSorted->pPrevious;
if (pSorted)
{
if (*(pSorted->pPlainText->
szPlainText))
--nScore;
}
--nWord;
}
}
}
}
if (! *bErr)
{
if (ulPossibleSolutions == (unsigned long) 0)
printf("No solutions were found.\n");
}
return;
}

static void AddWordNotInDictionary(
PatternNodePtr pPattern,
int *bErr)
{
PlainTextNodePtr pPlainText;

if (pPlainText=(PlainTextNodePtr)
malloc(sizeof(struct PlainTextNode)))
if (pPlainText->szPlainText
=(char *) malloc(sizeof(char)))
{
*(pPlainText->szPlainText)='\0';
pPlainText->pNext=NULL;
pPattern->pPlainTextHead=pPlainText;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
free((void *) pPlainText);
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
}
return;
}

static void AddPlainText(
PatternNodePtr pPatternHead,
char *szPattern,
char *szPlainText,
int *bErr)
{
int nRelation;
int bFinished;
PatternNodePtr pInternal;
PlainTextNodePtr pPlainText;

pInternal=pPatternHead;
bFinished=FALSE;
while (! bFinished)
{
nRelation=strcmp(szPattern,pInternal->szPattern);
if (nRelation < 0)
if (pInternal->pLesser)
pInternal=pInternal->pLesser;
else
bFinished=TRUE;
else
if (nRelation > 0)
if (pInternal->pGreater)
pInternal=pInternal->pGreater;
else
bFinished=TRUE;
else
{
if (pPlainText=(PlainTextNodePtr)
malloc(sizeof(struct PlainTextNode)))
if (pPlainText->szPlainText=(char *)
malloc(1+strlen(szPlainText)))
{
strcpy(pPlainText->szPlainText,
szPlainText);
++(pInternal->ulPlainTexts);
pPlainText->pNext
=pInternal->pPlainTextHead;
pInternal->pPlainTextHead=pPlainText;
}
else
{
*bErr=TRUE;
printf(
"Fatal error: out of memory\n");
free((void *) pPlainText);
}
else
{
*bErr=TRUE;
printf(
"Fatal error: out of memory\n");
}
bFinished=TRUE;
}
}
return;
}

static void GetPatternForPlainTextWord(
char *szWord,
char *szPattern,
int *bAlphabetic)
{
char cDistinctChars;
char *pc1;
char *pc2;
char *pc3;
char *pc4;

*bAlphabetic=TRUE;
pc1=szWord;
pc2=szPattern;
while ((*bAlphabetic) && (*pc1) && (*pc1 != '\n'))
{
if (*pc1 != '\'')
{
*pc1=(char) tolower((int) *pc1);
*bAlphabetic=islower((int) *pc1);
}
*pc2='\0';
++pc2;
++pc1;
}
if (*bAlphabetic)
{
*pc1='\0';
*pc2='\0';
cDistinctChars='\0';
pc1=szWord;
pc2=szPattern;
while ((*pc1) && (*pc1 != '\n'))
{
if (*pc2 == '\0')
{
if (*pc1 == '\'')
*pc2=ALPHABET_SIZE;
else
{
*pc2=++cDistinctChars;
pc3=pc1+1;
pc4=pc2+1;
while (*pc3)
{
if (*pc3 == *pc1)
*pc4=cDistinctChars;
++pc4;
++pc3;
}
}
}
++pc2;
++pc1;
}
}
return;
}

static void FreePattern(
PatternNodePtr pPattern)
{
PlainTextNodePtr pPlainText;

if (pPattern)
{
FreePattern(pPattern->pLesser);
FreePattern(pPattern->pGreater);
free((void *) (pPattern->szPattern));
while (pPattern->pPlainTextHead)
{
pPlainText=pPattern->pPlainTextHead->pNext;
free((void *)
(pPattern->pPlainTextHead->szPlainText));
free((void *) (pPattern->pPlainTextHead));
pPattern->pPlainTextHead=pPlainText;
}
free((void *) pPattern);
}
return;
}

static char *PatternForWord(
char *szWord,
int *nDistinctLetters,
int *bErr)
{
char cDistinctChars;
char *pc1;
char *pc2;
char *pc3;
char *pc4;
char *szPattern;

if (szPattern=(char *) malloc(1+strlen(szWord)))
{
pc1=szWord;
pc2=szPattern;
while (*pc1)
{
*pc2='\0';
++pc2;
++pc1;
}
*pc2='\0';
cDistinctChars='\0';
pc1=szWord;
pc2=szPattern;
while (*pc1)
{
if (*pc2 == '\0')
{
if (*pc1 == '\'')
*pc2=ALPHABET_SIZE;
else
{
*pc2=++cDistinctChars;
pc3=pc1+1;
pc4=pc2+1;
while (*pc3)
{
if (*pc3 == *pc1)
*pc4=cDistinctChars;
++pc4;
++pc3;
}
}
}
++pc2;
++pc1;
}
*nDistinctLetters=cDistinctChars;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
*nDistinctLetters=0;
}
return szPattern;
}

static PatternNodePtr PatternForEncipheredWord(
char *szEncipheredWord,
PatternNodePtr *pPatternHead,
int *bErr)
{
int bFinished;
int nDistinctLetters;
int nRelation;
PatternNodePtr pExternal;
PatternNodePtr pInternal;
char *szPattern;

pInternal=NULL;
szPattern=PatternForWord(szEncipheredWord,
&nDistinctLetters,bErr);
if (! *bErr)
{
if (*pPatternHead)
{
pInternal=*pPatternHead;
bFinished=FALSE;
do
{
nRelation
=strcmp(szPattern,pInternal->szPattern);
if (nRelation < 0)
if (pInternal->pLesser)
pInternal=pInternal->pLesser;
else
{
if (pExternal=(PatternNodePtr)
malloc(sizeof(struct PatternNode)))
{
pExternal->szPattern=szPattern;
pExternal->nDistinctLetters
=nDistinctLetters;
pExternal->ulPlainTexts
=(unsigned long) 0;
pExternal->pPlainTextHead=NULL;
pExternal->pLesser=NULL;
pExternal->pGreater=NULL;
pInternal->pLesser=pExternal;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of "
"memory\n");
}
pInternal=pExternal;
bFinished=TRUE;
}
else
if (nRelation > 0)
if (pInternal->pGreater)
pInternal=pInternal->pGreater;
else
{
if (pExternal=(PatternNodePtr)
malloc(
sizeof(struct PatternNode)))
{
pExternal->szPattern
=szPattern;
pExternal->nDistinctLetters
=nDistinctLetters;
pExternal->ulPlainTexts
=(unsigned long) 0;
pExternal->pPlainTextHead=NULL;
pExternal->pLesser=NULL;
pExternal->pGreater=NULL;
pInternal->pGreater=pExternal;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of "
"memory\n");
}
pInternal=pExternal;
bFinished=TRUE;
}
else
{
free((void *) szPattern);
bFinished=TRUE;
}
}
while (! bFinished);
}
else
if (pInternal=(PatternNodePtr)
malloc(sizeof(struct PatternNode)))
{
pInternal->szPattern=szPattern;
pInternal->nDistinctLetters=nDistinctLetters;
pInternal->ulPlainTexts=(unsigned long) 0;
pInternal->pPlainTextHead=NULL;
pInternal->pLesser=NULL;
pInternal->pGreater=NULL;
*pPatternHead=pInternal;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
}
}
return pInternal;
}

static PatternNodePtr AddEncipheredWord(
char *szEncipheredWord,
EncipheredNodePtr *pSortedHead,
EncipheredNodePtr *pEncipheredTail,
PatternNodePtr *pPatternHead,
int *bErr)
{
EncipheredNodePtr pExternal;
PatternNodePtr pPattern;

pPattern=NULL;
if (pExternal=(EncipheredNodePtr)
malloc(sizeof(struct EncipheredNode)))
if (pExternal->szCipherText=(char *)
malloc(1+strlen(szEncipheredWord)))
{
strcpy(pExternal->szCipherText,szEncipheredWord);
pExternal->pSorted=NULL;
pPattern=PatternForEncipheredWord(
szEncipheredWord,pPatternHead,bErr);
pExternal->pPattern=pPattern;
pExternal->pNext=NULL;
if (*pEncipheredTail)
(*pEncipheredTail)->pNext=pExternal;
else
*pSortedHead=pExternal;
*pEncipheredTail=pExternal;
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
}
else
{
*bErr=TRUE;
printf("Fatal error: out of memory\n");
}
return pPattern;
}

By James L. Dean (not verified) on 18 Aug 2008 #permalink