Jan-Erik R.
2/7/2009 8:54:00 PM
Tom Cloyd schrieb:
> David A. Black wrote:
>> Hi --
>>
>> On Sat, 7 Feb 2009, Tom Cloyd wrote:
>>
>>> Tom Cloyd wrote:
>>>> I'm baffled by this strange outcome - I cannot reduce multiple
>>>> spaces from a text file. This isn't just a regex problem, somehow.
>>>> I'm failing to grasp something essential, but don't know what it is.
>>>> All help appreciated, as usual!
>>>>
>>>> Here is a demo of my problem, in which I try two different ways, and
>>>> both fail:
>>>>
>>>> === code ===
>>>> # h2t.rb
>>>>
>>>> def main
>>>> # conversion table spec
>>>> conv = [
>>>> [ '<h1>', 'h1. ' ], [ '<h2>', 'h2. ' ], [ '<h3>', 'h3. ' ],
>>>> [ '<h4>', 'h4. ' ], [ '<h5>', 'h5. ' ], [ '<h6>', 'h6. ' ], [
>>>> /<\/h\d>/, '' ],
>>>> [ " +", ' ' ]] # <= this last array element should do the trick,
>>>> but doesn't
>>> Ouch. THIS - [ / +/, ' ' ], substituted for [ " +", ' ' ] above fixes
>>> it. I'm going blind, obviously.
>>
>> Just for fun, here's another way to write the method:
>>
>> def main
>> data = File.read("tom.txt")
>> data.gsub!(/<(h[1-6])>/, "\\1. ")
>> data.gsub!(/<\/h\d>/, "")
>> data.squeeze!(' ')
>>
>> open("tom.out", "w") {|f| f.write(data) }
>>
>> end
>>
>> I think that does the same thing. Tweak to taste :-)
>>
>>
>> David
>>
> That's beautifully economical, and reveals a far better grasp of regex
> than I was able to attain last night. However, I'm having trouble with
> this line:
>
> data.gsub!(/<(h[1-6])>/, "\\1. ")
>
> It certain works, but I don't grasp the "\\1. " part. I haven't yet
> found anything that might shed light on this magic. How does it retain
> the 'h' and whatever digit follows it? It looks somehow like "\\" ==
> retain matched alpha, and the "1" does the same for matched digits, but
> I really haven't a clue. Can you elucidate just a bit?
>
> Thanks!
>
> Tom
>
ah...regex! it's easy if you know them =D
the (...) in the Regex defines a group.
this group now includes the 'h' followed by one of the numbers 1,2,3,4,5
or 6
in the second parameter \1 (double slash because of
double-quotes/escaping ;) now is assgined to the matched pattern /h[1-6]/
that's it, nothing magic anymore ;)