Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
HTML table to matrix with WWW::Mechanize
Adam Hinchliffe
11/12/2006 7:15:00 PM
Hi,
I am new to ruby and am trying to scrape a website table into a matrix,
I have been playing around with WWW::Mechanize and have had some success
getting the page, extracting the table I want and then separating the
result by table rows. The problem comes with then splitting it down by
table data, I am going through each result in the array to break it by
the <td> tag, but my code appears to have zero effect!
The code I am using is below; any help in getting the table into a
matrix would be really appreciated.
Thanks
Adam
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page =
agent.get("
http://horses.sportinglife.com/Racecards/0,12495,215137,00....
).search("//table[@class='racecard_table']")
tablerows = page.search("//tr")
puts tablerows.length
finalresult = Array.new
tablerows.each do |tablerows|
finalresult << tablerows.search("//td")
end
puts finalresult.length
--
Posted via
http://www.ruby-...
.
1 Answer
Paul Lutus
11/12/2006 7:30:00 PM
0
Adam Hinchliffe wrote:
> Hi,
>
> I am new to ruby and am trying to scrape a website table into a matrix,
> I have been playing around with WWW::Mechanize and have had some success
> getting the page, extracting the table I want and then separating the
> result by table rows. The problem comes with then splitting it down by
> table data, I am going through each result in the array to break it by
> the <td> tag, but my code appears to have zero effect!
>
> The code I am using is below; any help in getting the table into a
> matrix would be really appreciated.
/ ... snip code listing
Try this:
-----------------------------------------
#!/usr/bin/ruby -w
table = "<table><tr>\n" +
"<td>4</td><td>47</td><td>1</td><td>19</td></tr>\n" +
"<tr><td>7</td><td>49</td><td>4</td><td>39</td></tr>\n" +
"<tr><td>14</td><td>17</td><td>19</td><td>21</td>\n" +
"</tr></table>\n"
rows = table.scan(%r{<tr>.*?</tr>}m)
rows.each do |row|
fields = row.scan(%r{<td>(.*?)</td>}m)
puts fields.join(",")
end
-----------------------------------------
Output:
4,47,1,19
7,49,4,39
14,17,19,21
Try this filter on a more complex page, one with differing numbers of cells
in each row, etc. It works quite well, and you can understand what it is
doing at a glance.
And notice that most of the listing is sample data, the actual filter code
consists of five lines of Ruby.
The important thing to remember about Ruby is that actually writing code is
so efficient that it is hard to justify applying a predefined package to
some of the simpler processing tasks.
--
Paul Lutus
http://www.ara...
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
HTML table to matrix with WWW::Mechanize
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password