Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
using HPricot to parse a fiddly table
Adam Dullenty
1/6/2008 7:13:00 PM
Hi there,
I'm fairly new to Ruby, previously I was an average programmer in Java,
so it's all a bit foreign to me - especially XPath and cSS. I would be
grateful if someone could give me a hand with a problem I'm having. I
have a table which I'm trying to get the fields from in a certain way.
The table is in the form:
<table>
<tr>
<td>...stuff I don't want...</td>
</tr>
<tr>
<td>
<table>
------------rows i want
<tr>
<td>
<table>
<tr>
<td>Field 1</td>
<td>Field 2</td>
</tr>
</table>
</td>
<td>Field 3</td>
<td>Field 4, Field 5</td>
</tr>
------------end of rows i want
</table>
</td>
</tr>
</table>
I have managed to get HPricot to parse the page and return that HTML for
the table, however I'm struggling to get it into an array in the form
["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each row. I
would have hoped there would be some kind of built in method for
extracting data from a table, but I can't find one.
Thanks again, look forward to a reply :)
Adam
--
Posted via
http://www.ruby-...
.
2 Answers
Steve Ross
1/6/2008 8:39:00 PM
0
For the innermost table, try:
eles = doc.search('table table table td')
for the enclosing table,
eles = doc.search('table table td')
I don't suppose the semantics can be improved any -- like class names
or ids?
On Jan 6, 2008, at 11:13 AM, Adam Dullenty wrote:
> Hi there,
>
> I'm fairly new to Ruby, previously I was an average programmer in
> Java,
> so it's all a bit foreign to me - especially XPath and cSS. I would be
> grateful if someone could give me a hand with a problem I'm having. I
> have a table which I'm trying to get the fields from in a certain way.
> The table is in the form:
>
> <table>
> <tr>
> <td>...stuff I don't want...</td>
> </tr>
> <tr>
> <td>
> <table>
> ------------rows i want
> <tr>
> <td>
> <table>
> <tr>
> <td>Field 1</td>
> <td>Field 2</td>
> </tr>
> </table>
> </td>
> <td>Field 3</td>
> <td>Field 4, Field 5</td>
> </tr>
> ------------end of rows i want
> </table>
> </td>
> </tr>
> </table>
>
> I have managed to get HPricot to parse the page and return that HTML
> for
> the table, however I'm struggling to get it into an array in the form
> ["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each
> row. I
> would have hoped there would be some kind of built in method for
> extracting data from a table, but I can't find one.
>
> Thanks again, look forward to a reply :)
> Adam
> --
> Posted via
http://www.ruby-...
.
>
Adam Dullenty
1/7/2008 12:49:00 AM
0
Steve Ross wrote:
> I don't suppose the semantics can be improved any -- like class names
> or ids?
Thanks for your reply. Afraid not, no handy names or ids. The code you
posted I think I was doing anyway in a slightly different form as
"elements2 = (elements/"table//table//td")". Since I posted last though
I've managed to sort it out just by lots of array manipulation.
Thanks for the help though :-)
Adam
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
using HPricot to parse a fiddly table
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password