byronsalty
6/16/2007 7:52:00 PM
On Jun 16, 2:40 pm, "Axel Etzold" <AEtz...@gmx.de> wrote:
> regexp=/.../ <= searched for, such that:
> string="<NC> In North Carolina </NC>"
> ref=regexp.match(string)
> p ref[1] => "In North Carolina"
This will work pretty well (works for the above):
/<\w+>(.*?)<\/\w+>/
The only thing fancy there is making the .* non-greedy by adding .*?.
This means it will take the shortest possible match instead of the
longest.
But it will not work as I think you would want with a string of nested
clauses. If you want to include internal clauses then you would need
to make sure that the close tag matches the open tag. The side effect
is that you'll need to have another sub match within the regex.
So consider:
/<(\w+)>(.*?)<\/\1>/
Example:
irb(main):033:0> str = "<NC>In North Carolina <FOO>adsf</FOO> </NC>"
=> "<NC>In North Carolina <FOO>adsf</FOO> </NC>"
irb(main):034:0> re = /<(\w+)>(.*?)<\/\1>/
=> /<(\w+)>(.*?)<\/\1>/
irb(main):035:0> re.match(str)[1]
=> "NC"
irb(main):036:0> re.match(str)[2]
=> "In North Carolina <FOO>adsf</FOO> "
Does that help?