Lars Christensen
7/29/2008 10:46:00 AM
On Jul 29, 12:34 pm, Guillermo.Ac...@koiaka.com wrote:
> I need to extract from an html file all the scripts. So I have written the
> following regular expression for a first test:
>
> %r|<script(.+)script>|m
>
> The problem I am having is that the expression takes the first <script and
> the very last script>. So it matches the beginning of the first script in
> the document and the end of the last script in the document with
> everything in the middle. I want to extract just the scripts one by one.
> How do I do it?
You can use the '?' regexp operator to make a lazy match rather than a
greedy.
%r|<script(.+?)script>|m
However, I suggest trying Hpricot for more robust HTML parsing.
Lars