Asp Forum - Fwd: Please Forward: Ruby Quiz Submission

James Gray

7/12/2006 7:32:00 PM

Begin forwarded message:

> From: James Quick <jimquick@mac.com>
> Date: July 12, 2006 9:51:17 AM CDT
> To: submission@rubyquiz.com
> Subject: Please Forward: Ruby Quiz Submission
>
> I am not on the ruby-talk list, and would appreciate if you could
> pass this
> along. This was a nice problem to get my feet wet in this language. I
> began looking at Ruby last week after hearing about what an old
> colleague
> of mine was doing with it. Because I am familiar with several
> languages
> with which it shares some features (smalltalk, eiffel, objective-
> c, perl) it
> is deceptively easy to pick up, and occasionally infuriating. Thank
> you,
> James, for providing this excellent service to the community.
> Sometimes
> it is essential to have throwaway problems to guide one's exploration.
>
> My first attempts were very classy (in a bad way) partitioning
> things into separate
> classes, rigorously defining interfaces and accessor methods. They
> were also
> very slow, and usually failed to converge on a solution.
>
> Then, by reading source code for a bunch of modules in /usr/lib/
> ruby, I found
> that mixing a few methods into the base classes was not only
> sufficient for
> most of my uses but preferable. Profiling showed that much of my
> time was
> being spent in each_byte loops so I ripped out the conversions to
> and from
> Strings and replaced it with a byte comparison factory which stored
> lambda
> functions. This way, I could pretend that I was producing arrays of
> byte counts
> from string operations without actually doing them more than once.
>
> This was more than twice as fast as my previous attempts but was still
> not converging very quickly. I had already optimized my target_counts
> array by initializing it with complete information on what was
> knowable
> about a solution before starting a search. Unfortunately during the
> search,
> if several hundred thousand loops had occurred, I would do some sanity
> checking which proved to be insane. I converted the target counts
> into a
> string then back into a count array. If the real result of the
> sanity check was
> beyond the range of the target and source, I would munge one or more
> values into my target array, thus throwing out all the good work!
>
> It was not until I saw the excellent submissions by Simon Krer that
> I realized
> that my sanity checking was demoting me to a random search. Optimal
> random
> Robinizing is more like solving a rubix cube than like searching
> randomly
> for text. If properly initialized the target and result are
> mathematically linked.
> If you make any change to the target or source without making a
> complementary
> change to the other, you will immediately devolve into a random
> search.
> Doing so is the equivalent of occasionally removing a piece from the
> cube, twisting it, and putting it back. You may have made it
> impossible
> to solve (unless by chance you randomly undo what you just did).
>
> Basically, the target contains a direct representation of the
> pangram search
> space in the form of c->n. Each count expresses the assertion that
> there be
> n occurrences of the character 'c' in the pangram. For a particular
> value of
> n it may be a completely bogus assertion. All that matters is that
> the assertion
> is congruent with the information stored in the result_string.
>
> I initialize my target and source with the exact values for
> constant text
> and their spelled out occurrences. Simon does it far more simply at
> the
> expense of a few more iterations (but his chainsaw has a much more
> powerful engine!). He initializes his target (which he calls
> guess) to
> an array of 1's indicating the invariant assumption that there will be
> 1 occurrence of each letter of the alphabet in the pangram. His
> result
> starts with an array of 1's (representing a-z from the itemized
> list n a's, n b's .... and n z's). He then adds the counts for
> "and", the prefix
> string, and 26 copies of the word "one". The 26 copies of the word
> "one"
> provide the required link which binds the guess and the "real" result
> together. As soon as he chooses a better value than 1 for a letter
> frequency he will subtract "one" and then add the spelled out version
> of the new choice.
>
> These 26 copies of "one", or the loop of calls to adder lambda
> functions
> in my version, each perform an equivalent task. They ensure that the
> target and result are rigorously linked.
>
> From here on, despite their quite different implementations, our
> solutions
> are essentially the same. Random Robinizing proceeds as follows:
> When the target and source differ, randomly choose a new value(n).
> Add or subtract that value from the source (plain integer arithmetic).
> In the destination, decrement the letter counts for each letter in the
> spelled out number (before the change), then increment the letter
> counts for the new spelled out number.
>
> As you can see, the source and destination have completely different
> operations performed on them. They are different but embody the
> essence of the pangram itself. A self referential sentence whose
> implementation (spelling) is identical to its semantic meaning
> (numerical commentary about the sentence).
>
> If you (like me) ran into a performance wall, and found that
> answers were converging slowly or not at all, take a step back.
> As soon as you break the underlying relationship between the
> target and destination, your program will stray into a random walk,
> which may contain long cycles of aimless wandering.
>
> I think there may be some bugs lurking here, though I think most are
> gone. I would appreciate any stylistic or other advice on best
> practices.
> I alternated several times between studlyCaps and c style naming.
> I also realize that I have know idea about the relative performance
> tradeoffs between cacheing variables and evaluating method calls.
>
> I did see some fairly large performance boosts when variables are
> found first in the outer scope rather then being allocated and lost
> with each iteration.
>
> Finally, I am interested in learning where to look for up to date
> information on the language definition or syntax. I could only find
> something circa 1.6.
>
> The following code makes me realize that I am far from being
> proficient in this language. However, I hope the comments and
> the basic algorithm are clear enough to provide some useful
> insight into the problem domain.
>
> here is some sample output.
>
> As you can see 9 of the 26 letters are already solved before the
> first pass
> through the loop. This neighborhood quickly converged. In retrospect
> I am now sure that the 'q' from my initials helped. Zebras are useful
> too.
> lili% ruby jqpangram.rb
> "--- 0 Wed Jul 12 10:32:42 EDT 2006"
> [7, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 1, 1, 1,
> 1, 1, 1, 2]
> [7, 2, 2, 2, 24, 2, 2, 2, 2, 2, 1, 1, 3, 24, 28, 2, 2, 5, 12, 10,
> 1, 2, 8, 1, 1, 2]
> "--- 10000 Wed Jul 12 10:32:47 EDT 2006"
> [7, 2, 2, 2, 28, 5, 3, 7, 8, 2, 1, 2, 3, 16, 17, 2, 2, 12, 31, 25,
> 3, 7, 12, 3, 4, 2]
> [7, 2, 2, 2, 35, 5, 4, 8, 8, 2, 1, 3, 3, 16, 15, 2, 2, 10, 32, 26,
> 2, 9, 13, 2, 4, 2]
> "--- 20000 Wed Jul 12 10:32:52 EDT 2006"
> [7, 2, 2, 2, 26, 6, 2, 4, 10, 2, 1, 1, 3, 22, 16, 2, 2, 10, 31, 26,
> 2, 5, 14, 4, 5, 2]
> [7, 2, 2, 2, 21, 7, 2, 3, 9, 2, 1, 1, 3, 17, 20, 2, 2, 9, 31, 25,
> 4, 4, 14, 5, 5, 2]
> "--- 30000 Wed Jul 12 10:32:57 EDT 2006"
> [7, 2, 2, 2, 28, 6, 4, 6, 10, 2, 1, 2, 3, 17, 16, 2, 2, 9, 30, 25,
> 3, 5, 12, 3, 4, 2]
> [7, 2, 2, 2, 27, 6, 3, 6, 10, 2, 1, 2, 3, 16, 15, 2, 2, 10, 32, 24,
> 3, 6, 12, 4, 4, 2]
> "--- 40000 Wed Jul 12 10:33:01 EDT 2006"
> [7, 2, 2, 2, 29, 8, 4, 8, 9, 2, 1, 3, 3, 15, 17, 2, 2, 12, 30, 23,
> 4, 6, 12, 2, 4, 2]
> [7, 2, 2, 2, 28, 7, 4, 7, 9, 2, 1, 3, 3, 17, 16, 2, 2, 11, 30, 25,
> 4, 5, 13, 2, 4, 2]
> "--- 50000 Wed Jul 12 10:33:06 EDT 2006"
> [7, 2, 2, 2, 30, 6, 2, 7, 10, 2, 1, 1, 3, 19, 17, 2, 2, 11, 30, 23,
> 3, 6, 10, 4, 4, 2]
> [7, 2, 2, 2, 28, 4, 2, 6, 7, 2, 1, 2, 3, 19, 16, 2, 2, 11, 31, 23,
> 3, 5, 10, 3, 4, 2]
> "--- 60000 Wed Jul 12 10:33:11 EDT 2006"
> [7, 2, 2, 2, 29, 7, 2, 6, 9, 2, 1, 3, 3, 19, 16, 2, 2, 11, 33, 23,
> 4, 5, 12, 4, 4, 2]
> [7, 2, 2, 2, 31, 6, 2, 6, 9, 2, 1, 3, 3, 20, 16, 2, 2, 12, 31, 23,
> 4, 6, 12, 3, 4, 2]
> "solution found in 69833 iterations"
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,
> 3, 5, 12, 4, 4, 2]
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,
> 3, 5, 12, 4, 4, 2]
> "mypan----------------------------------------"
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,
> 3, 5, 12, 4, 4, 2]
> "A pangram from jq contains one zebra, seven a's, two b's, two c's,
> two d's, twenty-six e's, eight f's, three g's, six h's, ten i's,
> two j's, one k, two l's, three m's, fifteen n's, sixteen o's, two
> p's, two q's, ten r's, thirty-one s's, twenty-five t's, three u's,
> five v's, twelve w's, four x's, four y's, and two z's."
> true
>
>
> cut here VVVV jqpangram.rb follows
> --------------------------------------
> #!/usr/local/bin/ruby -w
>
> class Integer
> ONES = %w[ zero one two three four five six seven eight nine ]
> TEEN = %w[ten eleven twelve thirteen fourteen fifteen
> sixteen seventeen eighteen nineteen ]
> TENS = %w[ zero ten twenty thirty forty fifty
> sixty seventy eighty ninety ]
>
> def to_en
> d, n = self.divmod(10)
> if d == 0
> ONES[n]
> elsif d == 1
> TEEN[n]
> elsif d < 10
> TENS[d] + ((n > 0) ? ("-" + ONES[n]) : "")
> else
> raise "Class Integer#to_en only goes to 99 not #{self}"
> end
> end
> end
>
> class String
> # to_counts returns a 26 element array containing the counts of
> # each letter in the input string. e.g. "add".to_counts ->
> [1,0,0,2,0...]
> # Only counting lower case for speed. Be sure to downcase.
> def to_counts
> counts = Array.new(26).fill(0)
> each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
> counts
> end
>
> # similar to to_counts except adds the result into an existing
> array.
> def accum_counts(counts)
> each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
> end
> end
>
> class Array
> include Enumerable
> # Return an array of textual pangram sections from a count_array
> def pan_body()
> self.zip((?a..?z).to_a).collect do |n, c|
> sprintf(" %s%s %c%s", > (c==?z ? "and " : ""), > n.to_en, c, > (n>1) ? "'s" : "")
> end
> end
>
> # Turn a pangram text array into a string.
> def pan_string()
> pan_body().join(',') + "."
> end
> end
>
> class CounterFactory
> # The designated initializer is called with a pangram prefix
> string
> # e.g. "This string contains". It sets up a pair of hash tables
> # filled with pattern counting functions and other initial values.
> def initialize(withString=nil)
> @add_word = {}
> @subtract_word = {}
> add_numbers('+', @add_word)
> add_numbers('-', @subtract_word)
> add_initial(withString) if withString != nil
> add_numbers('+', @add_word)
> end
>
> # add_function takes an input string and creates lambda
> functions which
> # transform count arrays (like those made by to_counts).
> # add_function(1, "one", "+", add_word) creates a function of
> arity 1
> # and adds it to the @add_word hash table with a key of 1.
> # Calling this function with a count array increments the 'o',
> 'n', and 'e'
> # letter counts. If the input array was initially filled with
> 0, then
> # the content of that array is equivalent to "one".to_counts.
> def add_function(key, aString, op, table)
> counts = aString.downcase.to_counts
> src = 'lambda { |targ| '
> counts.each_with_index do |n,i|
> src += "targ[#{i}] #{op}= #{n}; " if (n > 0)
> end
> src += '}'
> table[key] = eval(src)
> end
>
> # Add functions for each named number, with integer keys, to a
> table.
> def add_numbers(op, table)
> #add_function(0, "no", op, table)
> add_function(1, "one", op, table)
> for n in (2..99)
> add_function(n, n.to_en + "s", op, table)
> end
> end
>
> # Add a function mapping the constant elements of a pangram to
> a count array.
> def add_initial(prefix)
> # A pangram is : "prefix" + counts of each letter + "and"
> before 'z'
> known_string = prefix.downcase + ('a'..'z').to_a.join("") +
> "and"
> add_function(:initial, known_string, '+', @add_word)
>
> # known_count[letter] is the count of each letter we know
> must exist
> known_count = known_string.to_counts
>
> name_bytes = Array.new(26).fill(0)
> (1..99).each { |n| n.to_en.accum_counts(name_bytes) }
> in_names = name_bytes.collect {|n| n > 0}
> # Now we have an array of booleans representing whether a
> character
> # is constant (only appearing in the prefix or enumerated
> list)
> # or is a variable quantity (because it also appears in
> number names
>
> # From the count of what we initially knew to be present,
> return
> # our initial target. This is our best initial guess for a
> result
> # which will converge quickly. Basically, anything which is
> # not found in a name, is now known to occur a constant
> number of
> # times in the result. If it is in a name, I just punt and
> # prepare to see it 1 time like Simon does.
> @target_template = in_names.zip(known_count).collect do |
> in_name, known|
> if in_name
> 1
> else
> known
> end
> end
>
> # Starting with the count of known contents add the letter
> counts
> # for the numbers we expect to see. The counts in the
> target template
> # indicate that we expect to see N occurences of that
> (char) in the
> # result. Thus if 7 a's are reflected in the target we must
> have
> # an associated set of bytes {seven's} in the result. The
> following
> # code initializes the result template with the static byte
> counts
> # that we know must occur in the result. Using the target
> template as
> # a guide, we then call the appropriate adder_function to
> increment
> # the counts for the spelled out numbers which MUST be
> present in
> # the result if our guess remains congruent. e.g. if the
> target contained
> # [7, 2, 1, 2]... we MUST increment the result_template by
> # "sevens" "twos" "one" "twos"
> @result_template = known_count.dup
> @target_template.each do |n|
> @add_word[n].call(@result_template)
> end
>
> end
>
> # Make sure these don't get clobbered so return a copy
> # Should these be dup'ed or frozen instead?
> def target_template()
> @target_template.dup
> end
> def result_template()
> @result_template.dup
> end
>
> # This is syntactic sugar, taking any number of arguments
> # or an array of arguments and calling all of them on a fresh
> array.
> def array_from_functions(*args)
> counts = Array.new(26).fill(0)
> for arg in args
> if arg.is_a?(Array)
> arg.each do |o| @add_word[o].call(counts); end
> else
> p arg
> @add_word[arg].call(counts);
> end
> end
> counts
> end
>
> # Easy access to the function maps
> attr_reader :add_word, :subtract_word
> end
>
> def pangram_main()
> sallowPrefix = 'This pangram tallies'
> sallowTarget = [5, 1, 1, 2, 28, 8, 6, 8, 13, 1, 1,
> 3, 2, 18, 15, 2, 1, 7, 25, 22, 4, 4, 9, 2, 4, 1];
> prefix = "A pangram from jq contains one zebra,"
>
> # Initialize the CounterFactory and grab the maps
> cf = CounterFactory.new(prefix)
> add_word=cf.add_word
> subtract_word=cf.subtract_word
>
> # Set up the target and result arrays.
> # Though these will converge eventually these are not
> interchangeable!
> # They are complementary not equivalent. The target initially
> contains
> # a set of small integers. Each n refers to the expectation of
> finding
> # N of a particular character in the output. Each N == 1
> indicates
> # that we have no idea how many occur in the end result.
> # Any fortunate N > 1 means, we know exactly how may will be in
> the
> # final tally. The result_count entries contain sums for each byte
> # we know we will encounter and also the sum for the spelled out
> # numbers reflected in out target_count array.
> #
> # These two structure mirror what a pangram is.
> # The target reflects the semantic intent ->
> # It (the pangram) will have: 1 a, 2 b's...
> # The result reflects the mechanical instantiation
> # "This pangram sentence has one a, two b's..."
> target_count = cf.target_template
> result_count = cf.result_template
>
> delta, i, f, target, result = nil
> loopcount = 0
> different = true
>
> # We will loop until we converge
> while different
> if loopcount % 10000 == 0
> p "--- #{loopcount} " + Time.new.to_s
> p target_count, result_count
> end
> loopcount += 1
>
> # Each time through, visit and compare each target and
> result element
> # While they differ, we will change our guess by updating the
> # counts in the target_array. Change a guess by increasing
> # or decreasing the expected occurence of a single character.
> # Reflect that change in the result by adding and subtracting
> # multiple counts in the result corresponding to spelled out
> # numbers. For instance if target_count[i] is changed from
> # 7 to 12 we decrement the result buckets for the bytes in
> # 'sevens' and increment by the bytes in 'twelves'. The 'seve'
> # changes are a wash so we have 1 less n and one more t, w,
> and l
> # in the result.
>
> different = false
>
> # If the target and result differ make
> for i in (0..25)
> target = target_count[i]
> result = result_count[i]
>
> if target != result
> delta = rand((target - result).abs + 1)
>
> begin
> subtract_word[target].call(result_count)
> if target < result
> target = target + delta
> else
> target = result + delta
> end
> add_word[target].call(result_count)
> target_count[i] = target
> different = true
> rescue
> # I'm not sure how to handle this yet.
> # A while ago I was getting errors with some
> # pangram prefix strings. The calculations
> # diverged, causing calls to nonexistent
> # adders or subtractors: with keys 0, -1, -2...
> # instead of fixnums between 1 and 99
> # This usually happens only after 500K or so
> iterations.
> raise
> end
>
> end
> end
> end
>
> p "solution found in #{loopcount} iterations"
> p target_count, result_count
> mypan = prefix + target_count.pan_string
> t2=(mypan.downcase).to_counts
> p "mypan" + "--"*20
> p t2, mypan
> p t2 == target_count
> end
>
> if __FILE__ == $0
> pangram_main()
> end
> #[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31,
> 25, 3, 5, 12, 4, 4, 2]
> #panstring="A pangram from jq contains one zebra, seven a's, two
> b's, two c's, > #two d's, twenty-six e's, eight f's, three g's, six h's, ten i's,
> two j's, one k, > #two l's, three m's, fifteen n's, sixteen o's, two p's, two q's,
> ten r's, > #thirty-one s's, twenty-five t's, three u's, five v's, twelve w's,
> four x's, > #four y's, and two z's."
> #true
>
>

comp.lang.ruby

Fwd: Please Forward: Ruby Quiz Submission

James Gray

x Login to ForumsZone