[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[QUIZ] Encyclopedia Construction (#205

Daniel Moore

5/15/2009 4:39:00 PM

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can!
Visit: <http://rubyquiz.../sugge...

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.../q...

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Encyclopedia Construction (#205)

This week's quiz comes from Tim Hunter. Thank you Tim for the great write up!

When I was a kid we had an encyclopedia. It had thousands of articles
in alphabetical order, divided into about 25 volumes. Sometimes there
were so many articles with the same initial letter ("M", for example)
that an entire volume was dedicated to articles with that initial
letter. In some cases, though, there weren't enough articles with the
same initial letter to devote an entire volume to them, so those
articles were grouped in with alphabetically adjacent articles in the
same volume. For example, all the articles starting with W, X, Y, and
Z were in the same volume. Of course, all the articles that started
with the same initial letter were always in the same volume. Each
volume was labeled with the initial character of the articles it
contained. For example, the "M" volume was simply labeled "M". If
the articles were spread over more than one initial character, the
label displayed the first and last characters in the range. The W, X,
Y, and Z volume was labeled "W-Z".

The quiz is to devise a way of grouping a list of encyclopedia articles into
volumes. Since this is Ruby Quiz, we'll say that for "articles" we
have a list of about 100 words composed of alphabetical characters,
a-z. A "volume" is an array.

Sort the list and then put the words into arrays using the following rules.

1. All the words with the same initial character must be in the same array.
2. If an array contains words with different initial characters, the
words must be alphabetically adjacent to each other. That is, if
"cat", "dog", and "elephant", are in the list of articles, and you
put "cat" and "elephant" in the same array, then "dog" must also be in
that array.
3. Without breaking rules 1 and 2, divide the words into about 10 arrays.
4. Without breaking rules 1 and 2, each of the arrays should contain
about the same number of words.

To display the encyclopedia, make a hash with one element for each
array. The element value is the array. The element key is a single
letter ("M") when the array only contains words with the same initial
character, and a range ("W-Z") when the array contains words with more
than one initial character. Print the hash with the "pp" command.

Here's the list of words:

ape
apple
apricot
aquamarine
architect
artichoke
azure
baboon
badger
banana
bat
battleship
beeswax
beetles
black
blogger
blue
bricklayer
Brigadoon
card
cat
cherry
chokeberry
coconut
coral
crimson
currant
dam
debate
deliberations
demagogue
dog
durian
elephant
empathy
encyclopedia
fig
fox
gazelle
giraffe
goldenrod
gray
hippopotamus
huckleberry
ibex
imprint
indigo
ivory
jab
jackrabbit
jake
khaki
kiwi
kudzu
lavender
lemming
leopard
lime
loganberry
mango
maroon
memory
mole
monkey
mouse
muskox
Nigeria
Navajo
olive
opera
panther
party
peach
pear
penguin
persimmon
plum
poet
politician
privy
programmer
rabbit
red
research
rhino
Rumpelstiltskin
salmon
silver
snow
soldier
songsmith
squirrel
starship
steel
strawberry
student
tan
tapir
theater
thesaurus
tree
turquoise
umbrella
warthog
waterloo
wildebeest
wolf
woodchuck
xylophone
yahoo
yellow
zebra



--
-Daniel
http://rubyquiz...

8 Answers

Frank Fischer

5/18/2009 3:39:00 PM

0

Hi,

here's my solution. It's based on classic dynamic programming over the
number of words for each letter.

I used different objective functions to get good book sizes:

1. Maximize the minimal number of words in one book,
2. minimize the maximal number of words in one book,
3. minimize the absolute deviation of the number of words in one
book to the average number of words in one book ,
4. minimize the quadratic deviation of the number of words in one
book to the average number of words in one book.

For the given word list, objective function 4 seems to yield the best
results with good distributed word counts.

The program is called with three arguments:

1. The name of the word-list file (one word per line),
2. the number of books in which the words should be distributed,
3. a number 1,2,3 or 4 selecting the objective function.

The programm prints

1. the number of words per letter,
2. the words per book,
3. the number of words per book.

The current implementation is quite inelegant and could be improved
and be more rubyish.


---
require 'facets/enumerable'
require 'enumerator'
require 'pp'

word_file = ARGV.shift || "words.txt"
nbooks = ARGV.shift.to_i || 10
algorithm = ARGV.shift.to_i || 3

# ruby 1.8
class Integer
def ord; self; end
end


class Wordlist

attr_reader :books

def initialize( words )
# sort them
@words = words.sort {|w1, w2| w1.downcase <=> w2.downcase }

# get number of words per letter
@nletters = Array.new(26, 0)
@words.each do |w|
@nletters[w.downcase[0].ord - ?a.ord] += 1
end

end


# Returns the number of words per character
def letter_counts
(?A .. ?Z).mash{|c| [c.chr, @nletters[c.ord - ?A.ord]]}
end


# Returns number of words in each book
def counts
@books.mash{|range, words| [range, words.size]}
end


# divide into +nbooks+ books by minimizing the maximal number
# of words in one book
def min_maximum( nbooks )
dyn_min( nbooks ) { |*s| s.max }
end


# divide into +nbooks+ books by maximizing the minmal number
# of words in one book
def max_minimum( nbooks )
dyn_min( nbooks ) { |*s| -s.min }
end


# divide into +nbooks+ books by minimizing the deviation from the
# average number of words in one book
def min_deviat( nbooks )
mean = sum( 0...@nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |*s| s.inject(0){|sum,x| sum + (x - mean).abs} }
end


# divide into +nbooks+ books by minimizing the quadratic deviation
# from the average number of words in one book
def min_deviat2( nbooks )
mean = sum( 0...@nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |*s| s.inject(0){|sum,x| sum + (x - mean) ** 2 } }
end

private

# computes
# sum_{i\in range} @nletters[i]
def sum( range )
range.inject(0) {|s,i| @nletters[i] + s}
end


# A range of letters in the same book.
class Book
def initialize( range, n_words )
@range = range
@n_words = n_words
end

# The first letter in this book
def min; @range.min; end

# The last letter in this book
def max; @range.max; end

# Is letter x in this book?
def include?(x); @range.include?(x); end

# returns the number of all words in this book
attr_reader :n_words
end


# Computes a solution where
# max{ func(part_sum) }
# is minimized
def dyn_min( nbooks, &func )
mean = sum( 0...@nletters.size ).to_f / nbooks

books = Array.new( @nletters.size )
s = 0
(@nletters.size - 1).downto(0) do |i|
s += @nletters[i]
range = i ... @nletters.size
books[i] = [Book.new( range, sum(range) )]
end

nbooks.times do

new_books = Array.new( @nletters.size )

for s in 0 ... @nletters.size
best_value = nil
best_end = nil
sum = @nletters[s] # value of the current part
for e in (s + 1) ... @nletters.size
# stop if no further subdivisions are possible
break unless books[e]

value = func.call(sum, *books[e].map{|r| r.n_words})
if best_value.nil? || value < best_value
best_value = value
best_end = e
end

sum += @nletters[e] || 0
end

if best_value
new_books[s] = [Book.new(s ... best_end,
sum(s ... best_end))] +
books[best_end]
end
end

books = new_books
end

@books = Hash.new
books[0].each do |book|
words = @words.find_all{|w|
book.include?(w.downcase[0].ord - ?a.ord)
}
if book.min == book.max
key = (?A.ord + book.min).chr
else
key = "#{(?A.ord + book.min).chr}-#{(?A.ord + book.max).chr}"
end
@books[key] = words
end

@books
end
end

# read word list
words = []
File.open( word_file, "r" ) do |f|
f.each_line do |line|
line.strip!
words << line unless line.empty?
end
end



list = Wordlist.new( words )
p list.letter_counts

case algorithm
when 1
list.min_maximum( nbooks )
when 2
list.max_minimum( nbooks )
when 3
list.min_deviat( nbooks )
when 4
list.min_deviat2( nbooks )
else
raise "Unknown algorithm: #{algorithm} (should be in {0,1,2,3})"
end

pp list.books
p list.counts
---

Bye,
Frank

Luke Cowell

5/19/2009 4:54:00 AM

0

Here's my solution:
http://www.pastie....


Usage:
e = Encyclopedia.new(articles)
e.volumes(20) #<-- how many volumes should it be condensed to?
pp e.hash

puts e.dump #<-- this will print a nice little chart to help you
visualize the distribution of articles.


My strategy was to:
-divide each group of words starting with the same letter into it's own
array
-if I had to reduce the number of volumes:
-combine 2 adjacent letter arrays into a single array based on which
2 arrays have the smallest combined length
-repeat this procedure until the desired number of volumes is
achieved

Notes:
-This algorithm can result in poorly balanced distribution of volumes
because it glues together adjacent volumes permanently. A more even
distribution could be achieved if you could break a volume off after it
had already been combined.
-Certain number of volumes will result in really even distribution. 17
with the provided articles worked out well.
-I think it would be interesting to split larger volumes as well. I
think in some encyclopedias you'll find volumes with more common words
split (eg. Ma - Me and Mi -Mz)


My code could be much improved - quick and dirty. Well, maybe not so
much quick, but dirty.

Luke
--
Posted via http://www.ruby-....

Frank Fischer

5/20/2009 7:26:00 AM

0

On 2009-05-18, Frank Fischer <frank-fischer@shadow-soft.de> wrote:

Here's a second version, this time returning the correct number of books
(the first version returned the requested number of books + 1). And it
should be a little cleaner and faster.

Frank

--
require 'facets/enumerable'
require 'enumerator'
require 'pp'

word_file = ARGV.shift || "words.txt"
nbooks = (ARGV.shift || 10).to_i
algorithm = (ARGV.shift || 4).to_i

# ruby 1.8
if RUBY_VERSION[0,4] == "1.8."
class Integer
def ord; self; end
end
end


# A book containing the words starting with a letter in first .. last
class Book

attr_reader :first, :last, :n_words, :words

# Create a book with n words whose first letter is in range.
def initialize( range, n_words )
@range = range.min.ord .. range.max.ord
@n_words = n_words
end


# Set the words in this book
def words=( words )
@words = words.find_all{|w| include?(w.downcase[0])}
@n_words = @words.size
end


# Returns true if the words starting with the given letter are in
# this book
def include?( letter )
@range.include?( letter.ord - ?a.ord )
end

# Returns the character-range.
def range
if @range.min == @range.max
(?A.ord + @range.min).chr
else
(?A.ord + @range.min).chr + "-" + (?A.ord + @range.max).chr
end
end


def to_s; "<Book: %3s, size: #@n_words>" % range; end

def inspect; to_s; end

end


class Wordlist

attr_reader :books

def initialize( words )
# sort them
@words = words.sort {|w1, w2| w1.downcase <=> w2.downcase }

# get number of words per letter
@nletters = Array.new( 26, 0 )
@words.each do |w|
@nletters[w.downcase[0].ord - ?a.ord] += 1
end
end


# Returns the number of words per character
def letter_counts
(0...@nletters.size).mash{|i| [(?a.ord + i).chr, @nletters[i]]}
end


# divide into +nbooks+ books by minimizing the maximal number
# of words in one book
def min_maximum( nbooks )
dyn_min( nbooks ) { |s, rest| [s, rest || 0].max }
end


# divide into +nbooks+ books by maximizing the minmal number
# of words in one book
def max_minimum( nbooks )
dyn_min( nbooks ) { |s, rest| [-s, (rest || -s)].max }
end


# divide into +nbooks+ books by minimizing the deviation from the
# average number of words in one book
def min_deviat( nbooks )
mean = sum( 0 ... @nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |s, rest| (rest || 0) + (s - mean).abs }
end


# divide into +nbooks+ books by minimizing the quadratic deviation
# from the average number of words in one book
def min_deviat2( nbooks )
mean = sum( 0 ... @nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |s, rest| (rest || 0) + (s - mean) ** 2 }
end


private

# computes
# sum_{i\in range} @nletters[i]
def sum( range )
range.inject(0) {|s,i| @nletters[i] + s}
end


# Computes a solution minimizing the objective function.
def dyn_min( nbooks, &func )
# create initial books (1 book)
books = Array.new( @nletters.size )
s = 0
(@nletters.size - 1).downto(0) do |i|
s += @nletters[i]
range = i ... @nletters.size
sum = sum(range)
books[i] = [[Book.new( range, sum )], func.call(sum, nil)]
end

(nbooks - 1).times do
books = (0 ... @nletters.size).map { |s|
best_value = nil
best_end = nil
sum = @nletters[s] # value of the current part
for e in s.next ... @nletters.size
if books[e]
value = func.call(sum, books[e][1])
if best_value.nil? || value < best_value
best_value = value
best_end = e
end
end
sum += @nletters[e] || 0
end

if best_value
[[Book.new(s...best_end, sum(s...best_end))] +
books[best_end][0],
best_value]
else
nil
end
}

end

@books = books[0][0]
# collect the words into the books
@books.each do |book|
book.words = @words
end
end
end

# read word list
words = []
File.open( word_file, "r" ) do |f|
f.each_line do |line|
line.strip!
words << line unless line.empty?
end
end



list = Wordlist.new( words )
p list.letter_counts
puts "Number of words: #{list.letter_counts.values.inject(0){|s,x| s + x}}"

case algorithm
when 1
list.min_maximum( nbooks )
when 2
list.max_minimum( nbooks )
when 3
list.min_deviat( nbooks )
when 4
list.min_deviat2( nbooks )
else
raise "Unknown algorithm: #{algorithm} (should be in {1,2,3,4})"
end

pp list.books.mash{|book| [book.range, book.words]}
p list.books.mash{|book| [book.range, book.n_words]}
--

Luke Cowell

5/20/2009 3:29:00 PM

0

I updated my solution too. No real improvements to the algorithm, but
the code is more 'rubyish'. For good or bad, I found it useful to
subclass Array.

http://www.pastie....

Luke
--
Posted via http://www.ruby-....

The Horny Goat

7/1/2014 4:09:00 PM

0

On Mon, 30 Jun 2014 10:27:42 -0700 (PDT), Alex Milman
<alexmilman@msn.com> wrote:

>> Indeed - I know one old D-Day Canadian vet (he was with the
>> Sherbrookes) who insists he was bombed by both the USAAF and RAF and
>> never by the Luftwaffe!
>
>And the people keep saying that internet is not good for sarcasm. :-)
>
>[It probably worth noticing that the person in question survived; speaking
>about the precision bombing].

Old Albert DID say that when they saw low flying aircraft the troops
tended to bail out of trucks and jeeps for the nearest ditch and check
national markings on the aircraft AFTERWARDS. (Given the nature of the
fighting after Falaise that makes perfect sense to me!)

His D-Day experience wasn't typical - he was a D-Day MP and his
primary job on D-Day was to hustle the troops off the beaches and into
the woods beyond and he said he was encouraged to be as nasty as he
had to be since the woods were much less prone to random bullets than
the beach. Bearing in mind where the Sherbrookes (and the Canadians
generally) fought I'm not the least surprised. He wasn't an MP for his
whole service - but on D-Day they wanted men on the beaches to get the
relatively green troops out of harms way.

I've told the story here more than once of how he was wounded and
taken prisoner on the East side of the Rhine in 1945 - he said he met
Dieppe POWs who had had a very harsh war and who he greatly respected
but that his own captivity was brief and anything but harsh with his
guards chiefly interested in treating prisoners well enough that they
would put in a good word for them after the surrender they all knew
was coming sooner than later.

(As an employee he was extremely popular with other employees and
customers in the morning as his knowledge was unsurpassed while
extremely UNpopular with staff and customers in the afternoon mostly
because his typical lunch was a garlic salad that he swore was the
secret to his longetivity!)

pyotr filipivich

7/1/2014 6:06:00 PM

0

The Horny Goat <lcraver@home.ca> on Tue, 01 Jul 2014 09:08:56 -0700
typed in soc.history.what-if the following:
>On Mon, 30 Jun 2014 10:27:42 -0700 (PDT), Alex Milman
><alexmilman@msn.com> wrote:
>
>>> Indeed - I know one old D-Day Canadian vet (he was with the
>>> Sherbrookes) who insists he was bombed by both the USAAF and RAF and
>>> never by the Luftwaffe!
>>
>>And the people keep saying that internet is not good for sarcasm. :-)
>>
>>[It probably worth noticing that the person in question survived; speaking
>>about the precision bombing].
>
>Old Albert DID say that when they saw low flying aircraft the troops
>tended to bail out of trucks and jeeps for the nearest ditch and check
>national markings on the aircraft AFTERWARDS. (Given the nature of the
>fighting after Falaise that makes perfect sense to me!)
>
>His D-Day experience wasn't typical - he was a D-Day MP and his
>primary job on D-Day was to hustle the troops off the beaches and into
>the woods beyond and he said he was encouraged to be as nasty as he
>had to be since the woods were much less prone to random bullets than
>the beach. Bearing in mind where the Sherbrookes (and the Canadians
>generally) fought I'm not the least surprised. He wasn't an MP for his
>whole service - but on D-Day they wanted men on the beaches to get the
>relatively green troops out of harms way.
>
>I've told the story here more than once of how he was wounded and
>taken prisoner on the East side of the Rhine in 1945 - he said he met
>Dieppe POWs who had had a very harsh war and who he greatly respected
>but that his own captivity was brief and anything but harsh with his
>guards chiefly interested in treating prisoners well enough that they
>would put in a good word for them after the surrender they all knew
>was coming sooner than later.
>
>(As an employee he was extremely popular with other employees and
>customers in the morning as his knowledge was unsurpassed while
>extremely UNpopular with staff and customers in the afternoon mostly
>because his typical lunch was a garlic salad that he swore was the
>secret to his longetivity!)

LOL.


My landlord in Germany was evacuated off the beach in Normandy on
June 6th. One of those Milliard Mark wounds - between the bones of
the forearm - got you out of combat and home, but not so badly shot up
as to totally ruin your life.
--
pyotr filipivich.
For Sale: Uncirculated Roman Drachmas, feature Julius Ceaser's Portrait,
several dated 44 BCE. Comes with Certificate of Authenticity.

The Horny Goat

7/1/2014 7:57:00 PM

0

On Tue, 01 Jul 2014 11:05:50 -0700, pyotr filipivich
<phamp@mindspring.com> wrote:

> My landlord in Germany was evacuated off the beach in Normandy on
>June 6th. One of those Milliard Mark wounds - between the bones of
>the forearm - got you out of combat and home, but not so badly shot up
>as to totally ruin your life.

As part of the action that made him a POW, old Albert took flesh
wounds in the thigh and buttock. According to him the main effect on
his life was qualifying him for an extra $50 / month ($25 for the
wound, $25 for the POW status) for the rest of his life from the
grateful Canadian taxpayer....

pyotr filipivich

7/2/2014 11:15:00 PM

0

The Horny Goat <lcraver@home.ca> on Tue, 01 Jul 2014 12:56:59 -0700
typed in soc.history.what-if the following:
>On Tue, 01 Jul 2014 11:05:50 -0700, pyotr filipivich
><phamp@mindspring.com> wrote:
>
>> My landlord in Germany was evacuated off the beach in Normandy on
>>June 6th. One of those Milliard Mark wounds - between the bones of
>>the forearm - got you out of combat and home, but not so badly shot up
>>as to totally ruin your life.
>
>As part of the action that made him a POW, old Albert took flesh
>wounds in the thigh and buttock. According to him the main effect on
>his life was qualifying him for an extra $50 / month ($25 for the
>wound, $25 for the POW status) for the rest of his life from the
>grateful Canadian taxpayer....

You're paid to stop a bullet
Its a soldiers job they say.
And so you stop a bullet.
And then they stop your pay.
Kipling.


--
pyotr filipivich.
For Sale: Uncirculated Roman Drachmas, feature Julius Ceaser's Portrait,
several dated 44 BCE. Comes with Certificate of Authenticity.