Asp Forum - Re: [SUMMARY] Cryptograms (#13

Michael C. Libby

1/8/2005 1:41:00 PM

[LONG POST that is short on Ruby specific content, apologies]

On Thu, 2005-01-06 at 22:53 +0900, Ruby Quiz wrote:
> Solving a cryptogram by brute force is prohibitively expensive. The maximum
> number of possible solutions is 26!, or roughly 4*10^26, so the first challenge
> is to pare down the search to something manageable.

25! actually. A character cannot map to itself. But it is still a rather
large number of possible keys. :)

> All words in a dictionary can be grouped into families according to their
> patterns, and each cryptographic token has its own pattern that corresponds
> (with any luck), to one of the families from the dictionary. If a token has no
> matching family, then it cannot be solved with the given dictionary, so we
> won't worry about that case too much.

One thing I did wonder about is why the dictionary we used didn't
include 'i' and 'a'-- both very common words in English, and the likely
solution any time we see a single character token in a cipher text. Some
of my early tests did inject these two words. I forgot to add that back
in and see what the difference was.

> Michael represents each map as a set of character pairs.

That was one area I struggled with: representing possible maps and then
how to compare them.

some sort of mapping arithmethic where

(abc => dog) + (abc => peg) = (abc => ..g)
(abc => ..g) + (abc => fog) = (abc => fog)

Here is where I'm going with this:

Every possible mapping is a number between 0 (where the mapping is
abcdefghijklmnopqrstuvwxyz => ..........................) and 26! (which
would be a mapping of abcdefghijklmnopqrstuvwxyz =>
zyxwvutsrqponmlkjihgfedcba). In any case, it can be represented most
compactly as a 26 digit number in base 27 (26 letters plus the unsolved
token).

27 is not a computer friendly number, but 32 is.

32 = 2 ** 5 = 11111 in base 2 notation

So our mappings would easily work if we convert each character to a base
32 number (some of our number space will go unused) and concatenate. The
entire thing fits in a 130 digit binary number then. I am not going to
draw out entire examples, but the first mapping above (all ....s) would
be 000...0 and the other one (zyx...) would be 111...1.

Here it is in action with a pair of two letter mappings (ab => de) and
(ab => be) (x and y respectively):

> irb(main):033:0> x = 0b0010000101; y = 0b0001000101
> => 69
> irb(main):034:0> sprintf("%010b",(x & y))
> => "0000000101"

The resulting string is clearly the map (ab => .e)

The problem is that the example is contrived and falls down when the mappings
are (ab => ce) and (ab => be)-- any case where the binary representations have
some matching bits. (00011 (c) & 00010 (b) = 00010 (b), which is wrong).

Is there a way to overcome this? Maybe it's time to write a quick Mapping class
that provides comparison and combination methods... but it would be nice if there
were some easier way. :)

> The weakness in both mine and Michael's approach is that tokens are always
> mixed in to the solution using a single pre-defined order. But the tokens that
> are mixed in first can have an overwhelming influence on the final maps that
> result. In the worst case, the first token to be mapped can make it impossible
> to add any other tokens to the map.
>
> The only solution I know is to add another wrapper around the entire search
> process that mutates the order of the token mixing. I've implemented this in
> Python, but not in the Ruby version, and I think Michael's solution might be a
> better place to start from when persuing this.

I tried longest token first (most likely to have unique pattern?),
shortest token first (just for kicks), least number of patterns first
(keep the running list of possible maps to a minimum). I tried just
doing them in order presented, too. One thing I didn't try that I think
may be promising is finding the longest, most unique word and then
sorting the other tokens such that each successive token shares as much
of the mapping with what's already been solved.

But this is always going to be a problem. The other, worse problem, is
settling on a mapping that solves a non-dictionary-word token using a
dictionary word (i.e. "thomas aqua edison").

Anyway, this was great fun. My decrypt.rb solves the cryptogram in the
paper without too much work, so I'm happy. :)

-Michael, www.andsoforth.com

comp.lang.ruby

Re: [SUMMARY] Cryptograms (#13

Michael C. Libby

x Login to ForumsZone