Robert Dober
6/10/2007 12:45:00 PM
On 6/10/07, Logan Capaldo <logancapaldo@gmail.com> wrote:
> On 6/10/07, Robert Dober <robert.dober@gmail.com> wrote:
> >
> > On 6/10/07, Trochalakis Christos <yatiohi@ideopolis.gr> wrote:
> > > Hello!
> > >
> > > I want to parse a tagged string like this: "<i>this is</i><i>my
> > > string</i>"
> > >
> > > i am doing:
> > >
> > > >> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> > > => [["this is</i><i>my string"]]
> > >
> > > What i want is a regex that will return the *first* segment that
> > > matches.
> > > in the above case -> [["this is", "my string"]]
> > >
> > > Is there any way to do this?
> > >
> > > Thanks!
> > >
> > >
> > >
> > This is a FAQ, and yes I will give the solution ;)
> > Regexps are gready par default, they consume as many chars as
> > possible, there are some possibilities - not tested:
> >
> > (1) use non gready matches
> > "<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
> > (2) use less general expressions
> > "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
> > (3) Combine both ;)
> > "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)
>
>
> .Unless you want to match strings like <i><foo</i>, it would be simple to
> just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
> intent was to make the regexp not match that, a better regexp would be [^<]+
Thanks for correcting my typos.
Robert
--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw