[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

hpricot problem

Henry Maddocks

12/17/2006 9:52:00 AM

Not sure where to send this, sorry if it's not the right place...

The html in the attached file renders 'correctly' in the 3 browsers I
have tried but it tricks hpricot because of the second malformed
comment. When I say correctly I mean I get to see 'Some text'. I
guess it could be argued that this is incorrect. For my application
it would be nice if hpricot behaved like a browser.

Henry

4 Answers

RubyTalk@gmail.com

12/17/2006 4:17:00 PM

0

On 12/17/06, Henry Maddocks <henryj@paradise.net.nz> wrote:
> Not sure where to send this, sorry if it's not the right place...
>
> The html in the attached file renders 'correctly' in the 3 browsers I
> have tried but it tricks hpricot because of the second malformed
> comment. When I say correctly I mean I get to see 'Some text'. I
> guess it could be argued that this is incorrect. For my application
> it would be nice if hpricot behaved like a browser.
>
> Henry
>
>


I have found that saved pages from Firefox are different then the html
hpricot uses.

You can also check out the tickets for hpricot at
http://code.whytheluckystiff.net/hprico...

Stephen Becker IV

Peter Szinek

12/17/2006 4:55:00 PM

0

> I have found that saved pages from Firefox are different then the html
> hpricot uses.

Of course they are. If you save a page from Firefox, (or IE, or just any
browser) the page gets saved as-it-is (i.e. as the author put it on the
web server - with all the errors, non-conforming, unclosed or otherwise
malformed tags, etc.)

Now, what happens when a browser renders the page? It builds a Document
Object Model (DOM) out of the HTML and renders the DOM. This DOM
conforms to strict rules (i.e. no wild-wild-west HTML crapfest). If you
would dump it to a file as an XML, you would have a correct XHTML page,
which would resemble to the original HTML as much as the browser's DOM
building rules make this possible (generally very close to standards in
the case of Mozilla and Opera, crappy in the case of IE (at least before
6, I am not sure about 7)).

What Hpricot does is very similar: It builds a DOM of the HTML. (I am
not sure if _why calls this a DOM or whatever, but it is an internal
representation of the underlying HTML). Of course Mozilla DOM != HPricot
DOM (!= IE DOM != Opera DOM et cetera) therefore you can't make
assumptions about what does Hpricot do based on what does Mozilla do.

If you want Mozilla to parse your page and return the DOM (or serialize
it to XML so you can feed it to an XML/XSLT/XPath engine), I can show
you how, but only in Java - unfortunately Ruby's tools are not yet there.

Or, you can use Hpricot and forget about how it works in everywhere else...


Cheers
Peter

__
http://www.rubyra...


_why

12/18/2006 6:12:00 AM

0

On Mon, Dec 18, 2006 at 01:55:26AM +0900, Peter Szinek wrote:
> > I have found that saved pages from Firefox are different then the html
> > hpricot uses.
>
> What Hpricot does is very similar: It builds a DOM of the HTML. (I am
> not sure if _why calls this a DOM or whatever, but it is an internal
> representation of the underlying HTML). Of course Mozilla DOM != HPricot
> DOM (!= IE DOM != Opera DOM et cetera) therefore you can't make
> assumptions about what does Hpricot do based on what does Mozilla do.

Well, but, I'd actually like to get Hpricot's parser to be close to Firefox's.
So, what I'm saying is: if Hpricot appears to read HTML differently from
Firefox, I'd say that's a bug. Yep, for sure it is.

_why

Peter Szinek

12/18/2006 9:21:00 AM

0

> Well, but, I'd actually like to get Hpricot's parser to be close to Firefox's.

Wow. Wow. Wow.
I have thought you can't possibly tell me something new about Hpricot
that will make me a relevantly bigger fanboy, but once again, I was
proven wrong.

Keep up the great work.

Cheers,
Peter

btw. What about the XPath indices? Have you decided for indexing from 0?
__
http://www.rubyra...