lrlebron@gmail.com
11/11/2006 3:23:00 PM
Thanks for your help. I was able to get it with some hpricot code
intCells = tr.search("td").length
1.upto(intCells-1) do |i|
print tr.search("td:eq(#{i})").inner_html + ' '
end
thanks,
Luis
David Vallner wrote:
> lrlebron@gmail.com wrote:
> > I am trying to parse an html page that has strings that looks like this
> >
> > <tr class="bg2" height="17" valign="middle" align="right"><td
> > align="left"></td><td>4</td><td>47</td><td>1</td><td>19</td></tr> to
> > get the numbers inside the table cells.
> >
> > I would to end up with a simple string that looks like this (for this
> > row)
> > 4 47 1 19
> >
> > The number of table cells in a row that have numbers may vary for
> > different rows.
> > I'm new to Ruby so bear with me. I'm also learning to use hpricot and
> > have been able get the table rows using it
> >
>
> I'd use XPath, I'm not sure if that's doable with hpricot CSS selectors
> or its (admittedly, I think) basic XPath support.
>
> If you know the webpage is valid xhtml, I'd say switch to REXML, if not,
> massage with tidy (maybe hpricot can do this better too) and then switch
> to REXML.
>
> The code would probably be something like (where doc is the REXML document):
>
> bg2_strings = doc.elements.to_a(%{//tr[@class='bg2']}).map { | bg2_row |
> bg2_row.elements.to_a('td').map { |cell| cell.text }.join('
> ').strip.gsub(/\s+/, ' ')
> }
>
> Which might be horribly wrong, because I find REXML's XPath API hard to
> memorise. YMMV. (It also hates the text() axis specifier with a passion,
> whence the second map.)
>
> David Vallner
>
>
> --------------enigB38FA39D7D2640E58C81CF92
> Content-Type: application/pgp-signature
> Content-Disposition: inline;
> filename="signature.asc"
> Content-Description: OpenPGP digital signature
> X-Google-AttachSize: 188