Mikel Lindsaar
10/25/2007 7:37:00 AM
You can try each_child.
I will use each_child_with_index to show you what I mean:
Put your raw HTML text into @text
@parsed_html = Hpricot(@text)
@parsed_html.each_child_with_index do |c,i|
puts "Line #{i}: #{c.to_s.strip}"
end
Produces:
Line 0: This is one line of text
Line 1: <br />
Line 2: This is another line of text
Line 3: <br />
Line 4: It keeps going on like this
Line 5: <br />
Line 6:
Line 7: <br />
Line 8: Until a new paragraph is started
Line 9: <br />
Line 10: Otherwise, it's just more of the same
Line 11: <br />
Line 12:
Hope that helps.
Mikel
On 10/25/07, Just Another Victim of the Ambient Morality
<ihatespam@hotmail.com> wrote:
> I'm having trouble understanding Hpricot (thanks to an abominable lack
> of documentation). I'm trying to parse HTML of the following nature:
>
>
> This is one line of text<br />
> This is another line of text<br />
> It keeps going on like this<br />
> <br />
> Until a new paragraph is started<br />
> Otherwise, it's just more of the same<br />
>
>
> I know, it looks simple but, frankly, I have no clue how to parse this
> with Hpricot. Particularly, I don't know how to single out the lines of
> text in between the "br" tags. This is important 'cause I need to know
> where the line breaks are in the text, as well as the new paragraphs.
> Does anyone know how to do this with Hpricot?
> Thank you...
>
>
>
>