Jesús Gabriel y Galán
6/30/2008 7:09:00 AM
On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss@gmail.com> wrote:
> ## Statistician I (#167)
>
> This week begins a three-part quiz, the final goal to provide a little
> library for parsing and analyzing line-based data. Hopefully, each portion
> of the larger problem is interesting enough on its own, without being too
> difficult to attempt. The first part -- this week's quiz -- will focus on
> the pattern matching.
>
> Let's look at a bit of example input:
>
> You wound Perl for 15 points of Readability damage.
> You wound Perl with Metaprogramming for 23 points of Usability damage.
> Your mighty blow defeated Perl.
> C++ walks into the arena.
> C++ wounds you with Compiled Code for 37 points of Speed damage.
> You wound C++ for 52 points of Usability damage.
>
> Okay, it's silly, but it is similar to a much larger data file I'll provide
> end for testing.
>
> You should definitely note the repetitiveness: just the sort of thing that
> we can automate. In fact, I've examined the input above and created three
> rules (a.k.a. patterns) that match (most of) the data:
>
> [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
> damage].
> You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
> damage].
> Your mighty blow defeated[ the] <name>.
>
> There are a few guidelines about these rules:
>
> 1. Text contained within square brackets is optional.
> 2. A word contained in angle brackets represents a field; not a literal
> match, but data to be remembered.
> 3. Fields are valid within optional portions.
> 4. You may assume that both the rules and the input lines are stripped of
> excess whitespace on both ends.
>
> Assuming the rules are in `rules.txt` and the input is in `data.txt`,
> running your Ruby script as such:
>
> > ruby reporter.rb rules.txt data.txt
>
> Should generate the following output:
>
> Rule 1: Perl, 15, Readability
> Rule 1: Perl, Metaprogramming, 23, Usability
> Rule 2: Perl
> # No Match
> Rule 0: C++, Compiled Code, 37, Speed
> Rule 1: C++, 52, Usability
>
> Unmatched input:
> C++ walks into the arena.
>
Hi,
This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:
class Match
attr_accessor :captures, :mappings, :rule
def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end
def to_s verbose=false
s = "Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
s
else
s + "#{@captures.compact.join(",")}"
end
end
end
class Rule
attr_accessor :names, :id
# Translate rules to regexps, specifying if the first captured group
# has to be remembered
RULE_MAPPINGS = {
"[" => ["(?:", false],
"]" => [")?", false],
/<(.*?)>/ => ["(.*?)", true],
}
def initialize id, line
@id = id
@names = []
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << $1 if remember
replace
end
end
@reg = Regexp.new(reg)
end
def escape line
# From the mappings, change the regexp sensitive chars with non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub("[", "____").gsub("]", "_____")
escaped = Regexp.escape(escaped)
escaped.gsub("_____", "]").gsub("____", "[")
end
def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end
class RuleSet
def initialize file
@rules = []
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end
def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end
rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"
rule_set = RuleSet.new rules_file
matches = nil
unmatched = []
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end
matches.each do |m|
if m
puts m
else
puts "#No match"
end
end
unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end
#~ puts "Verbose output:"
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts "#No match"
#~ end
#~ end