[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Ruby multiline regex problem

Gregg Yows

4/8/2008 4:22:00 PM

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1"&...
something here Best</td>"


Pattern:

<td.*?>.*?<\/td\s*>


I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1"&...
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Thanks!
--
Posted via http://www.ruby-....

5 Answers

Todd Benson

4/8/2008 6:44:00 PM

0

On Tue, Apr 8, 2008 at 11:21 AM, Gregg Yows <gregg@yows.net> wrote:
> Code:
>
> "<td align="left" ><div style="width: 165px; height: 175px;"><a
> href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1"&...
> something here Best</td>"
>
>
> Pattern:
>
> <td.*?>.*?<\/td\s*>
>
>
> I'm trying to match this whole block and use it for further parsing.
> This started from an example in Brian Merick's book "Everyday
> Scripting..." that had to be modified because amazon has changed their
> presentation to tables instead of lists.
>
> Anyway, the regex works fine as a single-line. as soon as I introduce
> this:
>
> "<td align="left" ><div style="width: 165px; height: 175px;"><a
> href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1"&...
> something here
>
> Best</td>"
>
> it fails.
>
> When I try this same expression with perl using the //s mode, it works.
> I understand Ruby uses //m (multi-line mode in nearly the same fashion
> causing newlines to be considered any character, so it should work,
> right? Can anyone tell me what I am doing wrong here? Why isn't
> "multiline" mode working?
>
> Thanks!

<CODE>

s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1"&...
something here

Best</td>'

puts "######\ns:"
puts s

r1 = /<td.*?>.*?<\/td.*?>/m
r2 = /<td.*?>(.*?)<\/td.*?>/m

puts "######\nscan with r1:"
puts s.scan(r1)
puts
puts "######\nmatch with r1:"
puts (s.match r1)[0]
puts

s =~ r1
puts "######\n=~ and $1 with r1:"
puts $1

puts
puts
puts

puts "######\nscan with r2:"
puts s.scan(r2)
puts
puts "######\nmatch with r2:"
puts (s.match r2)[0]
puts

s =~ r2
puts "######\n=~ and $1 with r2:"
puts $1

</CODE>

Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
more appropriate or not.

Todd

Robert Klemme

4/9/2008 1:17:00 PM

0

2008/4/8, Gregg Yows <gregg@yows.net>:
> Code:
>
> "<td align="left" ><div style="width: 165px; height: 175px;"><a
> href="http://www.amazon.com/Rails-Rec...77616606/ref=pd_sim_b_njs_img_1"&...
> something here Best</td>"
>
>
> Pattern:
>
> <td.*?>.*?<\/td\s*>
>
>
> I'm trying to match this whole block and use it for further parsing.
> This started from an example in Brian Merick's book "Everyday
> Scripting..." that had to be modified because amazon has changed their
> presentation to tables instead of lists.
>
> Anyway, the regex works fine as a single-line. as soon as I introduce
> this:
>
> "<td align="left" ><div style="width: 165px; height: 175px;"><a
> href="http://www.amazon.com/Rails-Rec...77616606/ref=pd_sim_b_njs_img_1"&...
> something here
>
> Best</td>"
>
> it fails.
>
> When I try this same expression with perl using the //s mode, it works.
> I understand Ruby uses //m (multi-line mode in nearly the same fashion
> causing newlines to be considered any character, so it should work,
> right? Can anyone tell me what I am doing wrong here? Why isn't
> "multiline" mode working?

Works for me: no match without /m, match with /m:

irb(main):004:0> s=%q{<td align="left" ><div style="width: 165px;
height: 175px;"><a
irb(main):005:0'
href="http://www.amazon.com/Rails-Rec...77616606/ref=pd_sim_b_njs_img_1"&...
irb(main):006:0' something here Best</td>}
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Rec...
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):007:0> s[%r{<td.*?</td\s*>}]
=> nil
irb(main):008:0> s[%r{<td.*?</td\s*>}m]
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Rec...
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):009:0>

Cheers

robert

--
use.inject do |as, often| as.you_can - without end

Gregg Yows

4/10/2008 2:53:00 AM

0

Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

http://www.ru...





--
Posted via http://www.ruby-....

Robert Klemme

4/10/2008 8:43:00 AM

0

2008/4/10, Ransom Tullis <gregg@yows.net>:
> Thanks folks for all your help...turns out that I was using the regex
> test view in Eclipse (RDT) which was obviously not behaving properly in
> multi-line mode. I guess I need to go out and get the Aptana/Radrails
> plugin that has the latest RDT and ruby-debug built in. I identified the
> issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
> restarting that server!

Why look so far? IRB serves the same purpose.

Cheers

robert

--
use.inject do |as, often| as.you_can - without end

Gregg Yows

4/10/2008 12:34:00 PM

0

Robert Klemme wrote:
> 2008/4/10, Ransom Tullis <gregg@yows.net>:
>> Thanks folks for all your help...turns out that I was using the regex
>> test view in Eclipse (RDT) which was obviously not behaving properly in
>> multi-line mode. I guess I need to go out and get the Aptana/Radrails
>> plugin that has the latest RDT and ruby-debug built in. I identified the
>> issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
>> restarting that server!
>
> Why look so far? IRB serves the same purpose.
>
> Cheers
>
> robert

I'm a newb with Ruby and IRB. I did test the regex in IRB, but did not
know that I could set a literal string up with \n characters like you
did above through the interface. So, of course, it was passing
everytime. That is very cool! I am growing fonder of IRB every day...
--
Posted via http://www.ruby-....