[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Can't control regular expressions

Guillermo.Acilu

7/29/2008 10:34:00 AM

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

I need to extract from an html file all the scripts. So I have written the
following regular expression for a first test:

%r|<script(.+)script>|m

The problem I am having is that the expression takes the first <script and
the very last script>. So it matches the beginning of the first script in
the document and the end of the last script in the document with
everything in the middle. I want to extract just the scripts one by one.
How do I do it?

Thanks for your help,

Guillermo


2 Answers

Lars Christensen

7/29/2008 10:46:00 AM

0

On Jul 29, 12:34 pm, Guillermo.Ac...@koiaka.com wrote:
> I need to extract from an html file all the scripts. So I have written the
> following regular expression for a first test:
>
> %r|<script(.+)script>|m
>
> The problem I am having is that the expression takes the first <script and
> the very last script>. So it matches the beginning of the first script in
> the document and the end of the last script in the document with
> everything in the middle. I want to extract just the scripts one by one.
> How do I do it?

You can use the '?' regexp operator to make a lazy match rather than a
greedy.

%r|<script(.+?)script>|m

However, I suggest trying Hpricot for more robust HTML parsing.

Lars

Florian Gilcher

7/29/2008 10:58:00 AM

0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jul 29, 2008, at 12:34 PM, Guillermo.Acilu@koiaka.com wrote:

> Hello guys,
>
> I need to extract from an html file all the scripts. So I have
> written the
> following regular expression for a first test:
>
> %r|<script(.+)script>|m
>
> The problem I am having is that the expression takes the first
> <script and
> the very last script>. So it matches the beginning of the first
> script in
> the document and the end of the last script in the document with
> everything in the middle. I want to extract just the scripts one by
> one.
> How do I do it?
>
> Thanks for your help,
>
> Guillermo
>


Hi, regexps are not the right tool for this. You can find some
explanation on why that is, you can
will find some in this topic:

http://groups.google.com/group/ruby-talk-google/browse_thread/thread/2d86d1...

Regards,
Florian Gilcher
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiO98kACgkQJA/zY0IIRZb6zQCdFNi3h+bgYIVIebozgKachGEG
dxIAoId9e7cZVRQr4FYfVKsMKi3ye5Ug
=oXM6
-----END PGP SIGNATURE-----