[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ANN] hpricot 0.5 -- a fast, forgiving HTML reader

_why

2/1/2007 3:34:00 AM

Hi, here's Hpricot 0.5.

gem install hpricot --source http://code.whytheluck...

Hpricot reads HTML pages and works hard to fix them up and give you
everything you need to wind your way around them and hack them up!
Inspired by John Resig's JQuery and Tanaka Akira's HTree.

* Hpricot is standalone. It's dependant on no other libs, just
Ruby.
* Hpricot is fast, its parser is written in C with help of the
wonderful Ragel state machine compiler.
* However, Hpricot also works hard to fix up HTML and pays a small
penalty to get it right.
* How hard does Hpricot work? My rule is: if Firefox parses it,
Hpricot should too.

This release has a number of really nice features. The new
`to_original_html` method will try to preserve as much of the
original HTML as possible (including its mistakes) while still
merging in your changes. Also, you can test text nodes with syntax
like: `//a[text()='Click Me!']`.

Should appear on Rubyforge soon enough. Thank you to all the
ticketeers and patchistadores out there, especially Leslie Wu who's
been punching that commit button like she's doin the turtle trap!!

_why

5 Answers

Bernard Kenik

2/1/2007 4:35:00 PM

0

On Jan 31, 10:33 pm, _why <w...@ruby-lang.org> wrote:
> Hi, here's Hpricot 0.5.
>
> gem install hpricot --sourcehttp://code.whytheluck...
>
> Hpricot reads HTML pages and works hard to fix them up and give you
> everything you need to wind your way around them and hack them up!
> Inspired by John Resig's JQuery and Tanaka Akira's HTree.
>
> * Hpricot is standalone. It's dependant on no other libs, just
> Ruby.
> * Hpricot is fast, its parser is written in C with help of the
> wonderful Ragel state machine compiler.
> * However, Hpricot also works hard to fix up HTML and pays a small
> penalty to get it right.
> * How hard does Hpricot work? My rule is: if Firefox parses it,
> Hpricot should too.
>
> This release has a number of really nice features. The new
> `to_original_html` method will try to preserve as much of the
> original HTML as possible (including its mistakes) while still
> merging in your changes. Also, you can test text nodes with syntax
> like: `//a[text()='Click Me!']`.
>
> Should appear on Rubyforge soon enough. Thank you to all the
> ticketeers and patchistadores out there, especially Leslie Wu who's
> been punching that commit button like she's doin the turtle trap!!
>
> _why

just downloaded hpricot ... no warnings when I run my script.

however, the Rdoc flag is still turned off .. documentation please!!!

C:\..\Owner>gem query -n hpricot -r -s http://
code.whytheluckystiff.net"

*** REMOTE GEMS ***
Need to update 2 gems from http://code.whytheluck...
...
complete

hpricot (0.5, 0.4.99, 0.4.92, 0.4.90, 0.4.86, 0.4.76, 0.4.59, 0.4.52,
0.4.47, 0.4.43,
0.4, 0.3.32, 0.3, 0.2, 0.1)
a swift, liberal HTML parser with a fantastic library

C:\..\Owner>gem install hpricot --source http://
code.whytheluckystiff.net
Select which gem to install for your platform (i386-mswin32)
1. hpricot 0.5 (ruby)
2. hpricot 0.5 (mswin32)
3. hpricot 0.5 (ruby)
4. hpricot 0.5 (mswin32)
5. Skip this gem
6. Cancel installation
> 2
Successfully installed hpricot-0.5-mswin32

C:\Documents and Settings\Owner>

RubyGems Documentation Index

hpricot 0.4 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library


hpricot 0.4.99 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library


hpricot 0.5 [rdoc] [www]
a swift, liberal HTML parser with a fantastic library

in all of the listing, only "www" has an active link.

_why

2/1/2007 4:52:00 PM

0

On Fri, Feb 02, 2007 at 01:40:06AM +0900, bbiker wrote:
> however, the Rdoc flag is still turned off .. documentation please!!!
> [...]
> Successfully installed hpricot-0.5-mswin32

Oh, I see. This was for the windows one. Well, in the meantime you
can also use: http://code.whytheluckystiff.net/do.... Or
I've updated the gem on my personal repository.

_why

Bernard Kenik

2/1/2007 5:27:00 PM

0

On Feb 1, 11:51 am, _why <w...@ruby-lang.org> wrote:
> On Fri, Feb 02, 2007 at 01:40:06AM +0900, bbiker wrote:
> > however, the Rdoc flag is still turned off .. documentation please!!!
> > [...]
> > Successfully installed hpricot-0.5-mswin32
>
> Oh, I see. This was for the windows one. Well, in the meantime you
> can also use:http://code.whytheluck.../do.... Or

Thanks for the link to the rdoc documentation.


> I've updated the gem on my personal repository.
http://code.whytheluck... .. isn't this your personal
repository?

just uninstalled and reinstalled Hpricot, still no Rdoc

I don't mean to be a pain in the a.

Thanks for your patience

Bernard Kenik


> _why


Tim Hunter

5/11/2007 1:15:00 AM

0

Ron M wrote:
> Could hpricot die more gracefully and still parse the document
> leaving only that element invalid when it sees such very large
> attributes?
>

What did why say when you posted this problem to the hpricot mailing list?

http://code.whytheluckystiff.ne...

--
RMagick [http://rmagick.rub...]
RMagick Installation FAQ [http://rmagick.rub.../install-faq.html]


_why

5/11/2007 7:08:00 AM

0

On Fri, May 11, 2007 at 09:33:07AM +0900, Ron M wrote:
> /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.5/lib/hpricot/parse.rb:44:in `scan': ran out of buffer space on element <input>, starting on line 23. (Hpricot::ParseError)

Oh, you can increase the buffer size with: Hpricot.buffer_size = 262144

_why