[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[QUIZ] Statistician I (#167

Matthew Moss

6/27/2008 3:57:00 PM

[Note: parts of this message were removed to make it a legal post.]

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz 2:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.

2. Support Ruby Quiz 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Quiz 2. Until then,
please visit the temporary website at

<http://splatbang.com/rub....
3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
## Statistician I (#167)

This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each portion
of the larger problem is interesting enough on its own, without being too
difficult to attempt. The first part -- this week's quiz -- will focus on
the pattern matching.

Let's look at a bit of example input:

You wound Perl for 15 points of Readability damage.
You wound Perl with Metaprogramming for 23 points of Usability damage.
Your mighty blow defeated Perl.
C++ walks into the arena.
C++ wounds you with Compiled Code for 37 points of Speed damage.
You wound C++ for 52 points of Usability damage.

Okay, it's silly, but it is similar to a much larger data file I'll provide
end for testing.

You should definitely note the repetitiveness: just the sort of thing that
we can automate. In fact, I've examined the input above and created three
rules (a.k.a. patterns) that match (most of) the data:

[The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
damage].
You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
damage].
Your mighty blow defeated[ the] <name>.

There are a few guidelines about these rules:

1. Text contained within square brackets is optional.
2. A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
3. Fields are valid within optional portions.
4. You may assume that both the rules and the input lines are stripped of
excess whitespace on both ends.

Assuming the rules are in `rules.txt` and the input is in `data.txt`,
running your Ruby script as such:

> ruby reporter.rb rules.txt data.txt

Should generate the following output:

Rule 1: Perl, 15, Readability
Rule 1: Perl, Metaprogramming, 23, Usability
Rule 2: Perl
# No Match
Rule 0: C++, Compiled Code, 37, Speed
Rule 1: C++, 52, Usability

Unmatched input:
C++ walks into the arena.

Each line of the output corresponds to a line of the input; it indicates
which rule was matched (zero-based index), and outputs the matched fields'
values. Any lines of the input that could not be matched to one of the rules
should output an "No Match" comment, with all the unmatched input records
printed in the "Unmatched input" section at the end (so the author of the
rules can extend them appropriately).

One thing you should keep in mind while working on this week's quiz is that
you want to be flexible; followup quizzes will require that you modify
things a bit.

For testing, I am providing two larger datasets: combat logs taken from Lord
of the Rings Online gameplay. There is data for a [Guardian][1] and a
[Hunter][2]; unzip before use. Both use the same ruleset:

[The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
damage].
You are wounded for <amount> point[s] of <kind> damage.
You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
damage].
You reflect <amount> point[s] of <kind> damage to[ the] <name>.
You succumb to your wounds.
Your mighty blow defeated[ the] <name>.



[1]: http://www.splatbang.com/rubyquiz/files/gu...
[2]: http://www.splatbang.com/rubyquiz/files/...



--
Matthew Moss <matthew.moss@gmail.com>

12 Answers

Matthew Moss

6/29/2008 6:45:00 PM

0

Here's my own submission for this problem. Once you wrap your head
around a few bits of the regular expression, it's pretty simple to
understand.



class Rule
attr_reader :fields

def initialize(str)
patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
@pattern = Regexp.new('^' + patt + '$')
@fields = nil
end

def match(str)
if md = @pattern.match(str)
@fields = md.captures
else
@fields = nil
end
end
end


rules = []
File.open(ARGV[0]).each do |line|
line.strip!
next if line.empty?
rules << Rule.new(line)
end


unknown = []
File.open(ARGV[1]).each do |line|
line.strip!
if line.empty?
puts
next
end

if rule = rules.find { |rule| rule.match(line) }
indx, data = rules.index(rule), rule.fields.reject { |f| f.nil? }
puts "Rule #{indx}: #{data.join(', ')}"
else
unknown << line
puts "# No match"
end
end


puts "\nUnmatched input:"
puts unknown.join("\n")


Matthew Rudy Jacobs

6/29/2008 8:19:00 PM

0

Matthew Moss wrote:
>
> def initialize(str)
> patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
> @pattern = Regexp.new('^' + patt + '$')
> @fields = nil
> end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.

Matthew Rudy
--
Posted via http://www.ruby-....

krusty.ar@gmail.com

6/30/2008 12:38:00 AM

0

> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> ## Statistician I (#167)
>

My first quiz, it's very rough but it works most of the time.

I'm probably (re)implementing a very limited form of regular
expression, but in the process of making this I discovered several
ways it could fail, in the test cases, it's just the case noted in the
comments.

Here is the code: http://pastie....
And the rules that catch most of the samples: http://pastie....

Lucas.

Matthias Reitinger

6/30/2008 1:20:00 AM

0

Here is my submission. I hope it's flexible enough for the followup
quizzes. I sensed there might be a need to access the fields of a match
by name, which is why I added the RuleMatch#fields method. It returns a
hash that allows code like

puts Rule.match(line).fields['amount'] # prints the value of the
<amount> field

This method isn't used in the current code however. But who knows, it
might come in handy later on.

You can find my submission at http://www.pastie....

- Matthias
--
Posted via http://www.ruby-....

Matthew Moss

6/30/2008 1:20:00 AM

0



On Jun 29, 3:19=A0pm, Matthew Rudy Jacobs <matthewrudyjac...@gmail.com>
wrote:
> Matthew Moss wrote:
>
> > =A0 def initialize(str)
> > =A0 =A0 patt =3D str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?=
)')
> > =A0 =A0 @pattern =3D Regexp.new('^' + patt + '$')
> > =A0 =A0 @fields =3D nil
> > =A0 end
>
> does the rule string not need to be regexp escaped =A0somehow if it's
> gonna be directly Regexp.new'ed?
>
> I fear a rule with something like "You run away[ from <name>] (you
> coward)" would break this approach.


Perhaps... My solution is likely not safe from all input sets. While I
hadn't considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.

ThoML

6/30/2008 5:48:00 AM

0

> ## Statistician I (#167)

Here is my solution:
http://www.pastie....

It's for ruby19 only.

ThoML

6/30/2008 6:02:00 AM

0

On Jun 30, 7:48 am, ThoML <micat...@gmail.com> wrote:
> > ## Statistician I (#167)
>
> Here is my solution:http://www.pastie....

So I thought I could use pastie for a change so that I could still
make minor modifications and don't have to repost the code. But ...
wrong URL. Sorry, let's hope this is the right one:

http://www.pastie....

Regards,
Thomas.

Jesús Gabriel y Galán

6/30/2008 7:09:00 AM

0

On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss@gmail.com> wrote:

> ## Statistician I (#167)
>
> This week begins a three-part quiz, the final goal to provide a little
> library for parsing and analyzing line-based data. Hopefully, each portion
> of the larger problem is interesting enough on its own, without being too
> difficult to attempt. The first part -- this week's quiz -- will focus on
> the pattern matching.
>
> Let's look at a bit of example input:
>
> You wound Perl for 15 points of Readability damage.
> You wound Perl with Metaprogramming for 23 points of Usability damage.
> Your mighty blow defeated Perl.
> C++ walks into the arena.
> C++ wounds you with Compiled Code for 37 points of Speed damage.
> You wound C++ for 52 points of Usability damage.
>
> Okay, it's silly, but it is similar to a much larger data file I'll provide
> end for testing.
>
> You should definitely note the repetitiveness: just the sort of thing that
> we can automate. In fact, I've examined the input above and created three
> rules (a.k.a. patterns) that match (most of) the data:
>
> [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
> damage].
> You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
> damage].
> Your mighty blow defeated[ the] <name>.
>
> There are a few guidelines about these rules:
>
> 1. Text contained within square brackets is optional.
> 2. A word contained in angle brackets represents a field; not a literal
> match, but data to be remembered.
> 3. Fields are valid within optional portions.
> 4. You may assume that both the rules and the input lines are stripped of
> excess whitespace on both ends.
>
> Assuming the rules are in `rules.txt` and the input is in `data.txt`,
> running your Ruby script as such:
>
> > ruby reporter.rb rules.txt data.txt
>
> Should generate the following output:
>
> Rule 1: Perl, 15, Readability
> Rule 1: Perl, Metaprogramming, 23, Usability
> Rule 2: Perl
> # No Match
> Rule 0: C++, Compiled Code, 37, Speed
> Rule 1: C++, 52, Usability
>
> Unmatched input:
> C++ walks into the arena.
>

Hi,

This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:

class Match
attr_accessor :captures, :mappings, :rule

def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end

def to_s verbose=false
s = "Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
s
else
s + "#{@captures.compact.join(",")}"
end
end
end

class Rule
attr_accessor :names, :id

# Translate rules to regexps, specifying if the first captured group
# has to be remembered
RULE_MAPPINGS = {
"[" => ["(?:", false],
"]" => [")?", false],
/<(.*?)>/ => ["(.*?)", true],
}
def initialize id, line
@id = id
@names = []
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << $1 if remember
replace
end
end
@reg = Regexp.new(reg)
end

def escape line
# From the mappings, change the regexp sensitive chars with non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub("[", "____").gsub("]", "_____")
escaped = Regexp.escape(escaped)
escaped.gsub("_____", "]").gsub("____", "[")
end

def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end

class RuleSet
def initialize file
@rules = []
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end

def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end

rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"

rule_set = RuleSet.new rules_file

matches = nil
unmatched = []
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end

matches.each do |m|
if m
puts m
else
puts "#No match"
end
end

unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end

#~ puts "Verbose output:"
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts "#No match"
#~ end
#~ end

Jesús Gabriel y Galán

6/30/2008 7:11:00 AM

0

On Mon, Jun 30, 2008 at 3:19 AM, Matthew Moss <matthew.moss@gmail.com> wrote:
>
>
> On Jun 29, 3:19 pm, Matthew Rudy Jacobs <matthewrudyjac...@gmail.com>
> wrote:
>> Matthew Moss wrote:
>>
>> > def initialize(str)
>> > patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
>> > @pattern = Regexp.new('^' + patt + '$')
>> > @fields = nil
>> > end
>>
>> does the rule string not need to be regexp escaped somehow if it's
>> gonna be directly Regexp.new'ed?
>>
>> I fear a rule with something like "You run away[ from <name>] (you
>> coward)" would break this approach.
>
>
> Perhaps... My solution is likely not safe from all input sets. While I
> hadn't considered literal parentheses as part of the rule set, I
> should have at the least considered the period (match any char).
>
> For the current purposes, it is sufficient if your solution supports
> the provided example ruleset, though any additional work towards
> escaping parts/preventing breakage is certainly acceptable.

I had to escape the string in order to make my solution work due to
the final dot...

Jesus.

benjamin.billian@googlemail.com

6/30/2008 2:05:00 PM

0

Here is my solution to this weeks quiz. It's also my first RubyQuiz.

http://www.pastie....