Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
html parsing using regular expressions
Anthony Walsh
10/25/2006 2:57:00 AM
I'm new to Ruby and trying to use regular expressions to parse an html
file. The page is a large table with no spaces in the html code. I want
to count the number of times <tr> or <tr 'anything'> occurs. I'm stuck
on trying to match every variety of <tr>
I've tried
op_file = File.read(htmlfile)
if op_file =~ /(<tr(.*?)>)+/
but it catches the first <tr and matches all the way to the end of the
file. Anyone have any advice on matching and counting?
-Shinkaku
--
Posted via
http://www.ruby-...
.
2 Answers
Austin Ziegler
10/25/2006 5:02:00 AM
0
On 10/24/06, Anthony Walsh <akakuda@excite.com> wrote:
> I'm new to Ruby and trying to use regular expressions to parse an html
> file.
Don't. Use Hpricot instead. Your brain will thank you for it.
I haven't used Hpricot, but I've heard great things about it; I've
tried to do HTML parsing with regexen, and it's a mook's game.
-austin
--
Austin Ziegler * halostatue@gmail.com *
http://www.halo...
* austin@halostatue.ca *
http://www.halo...
feed/
* austin@zieglers.ca
Paul Lutus
10/25/2006 5:20:00 AM
0
Anthony Walsh wrote:
> I'm new to Ruby and trying to use regular expressions to parse an html
> file. The page is a large table with no spaces in the html code. I want
> to count the number of times <tr> or <tr 'anything'> occurs. I'm stuck
> on trying to match every variety of <tr>
>
> I've tried
>
> op_file = File.read(htmlfile)
> if op_file =~ /(<tr(.*?)>)+/
>
> but it catches the first <tr and matches all the way to the end of the
> file. Anyone have any advice on matching and counting?
You need to tell us whether you have read the replies you received to this
same question when you asked it eight hours ago. I answered your question,
several others did also, you have not given any indication that you saw the
replies.
Here is one answer:
#!/usr/bin/ruby -w
path="path-to-HTML-page"
data = File.read(path)
array = data.scan(%r{<tr.*?>})
puts array.size # gives a count of occurrences
puts array # shows the matches
Please read replies before posting again.
--
Paul Lutus
http://www.ara...
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
html parsing using regular expressions
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password