Robert Klemme
9/10/2008 9:06:00 AM
2008/9/10 Xiong Chiamiov <xiong.chiamiov+ruby_forum@gmail.com>:
> Ruby 1.8.6 with Oniguruama installed and working (everywhere else, this
> seems to be my problem).
>
> Let me preface this by saying that I am new to Ruby (and kinda jumped
> in, rather than learning it properly), and regexes are not my thing -
> that why I have nifty regex-checkers.
>
> I am trying to extract some parts out of a string
> ("<p><b>'Algebra'</b><br>") that I scraped from some html. I'm getting
> nil returned from the expression:
>
> Oniguruma::ORegexp.new("(?<=<p><b>').*(?='</b><br>)").scan(scraped_html)
>
> with scraped_html being the string mentioned above.
>
> Doing some experimenting, I have found that the first part works just as
> planned (eg, everything except the lookahead). Using wildcards (. and
> *) works as well:
>
> Oniguruma::ORegexp.new("(?<=<p><b>').*(?=.)").scan(scraped_html)
>
> returns [#<MatchData "Foo'</b><br">, #<MatchData "Bar'</b><br">], as
> expected. However, anything else (<, b, \w, etc.) causes the regex to
> not match.
>
> I am quite befuddled about this, though I (almost certainly) know it is
> my fault. Any help would be much appreciated.
With 1.9:
irb(main):001:0> s="<p><b>'Algebra'</b><br>"
=> "<p><b>'Algebra'</b><br>"
irb(main):002:0> s.scan %r{(?<=<p><b>').*(?='</b><br>)}
=> []
irb(main):003:0> s.scan %r{(?<=<p><b>').*?(?='</b><br>)}
=> ["Algebra"]
Note the non greedy match. I usually rather do this in those cases:
irb(main):005:0> s.scan %r{<p><b>'(.*?)'</b><br>}
=> [["Algebra"]]
I.e. use groups to extract the part that I am interested in.
Kind regards
robert
--
use.inject do |as, often| as.you_can - without end