Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
[ANN] Hpricot 0.6 -- the swift, delightful HTML parser
_why
6/16/2007 12:02:00 AM
Hpricot 0.6 is up on Rubyforge. Or get it from the development gem
server:
gem install hpricot --source
http://code.whytheluck...
Hpricot is a flexible HTML parser written in C. Nestled in a nice
Ruby wrapper. But Hpricot takes a lot of extra steps to help you
out.
Hpricot is great for both scraping web sites and altering HTML
safely. With plenty of options for either cleaning up HTML or
leaving unmodified areas untouched.
HPRICOT's HOME PAGE and WIKI
http://code.whytheluck...
/hpricot/
HPRICOT's DOCUMENTATION
http://code.whytheluck...
/doc/hpricot/
= EXAMPLES =
Loading an HTML page:
require 'open-uri'
require 'hpricot'
doc = Hpricot(open("
http://ruby-lang....
))
Fixing HTML into XHTML:
doc = Hpricot(open("
http://ruby-lang....
), :fixup_tags => true)
Placing a number next to each link on a page, preserving the
original HTML as much as possible:
doc = Hpricot(open("
http://ruby-lang....
))
num = 0
(doc/"a").append do
strong " [#{num += 1}]"
end
puts doc.to_original_html
(Notice how you can use a simple Ruby syntax for adding HTML tags
inside the block attached to the `append` method!)
= CHANGELOG =
* Hpricot for JRuby -- nice work Ola Bini!
* Inline Markaby for Hpricot documents.
* XML tags and attributes are no longer downcased like HTML is.
* new syntax for grabbing everything between two elements using a Range in the search method: (doc/("font".."font/br")) or in nodes_at like so: (doc/"font").nodes_at("*".."br"). Only works with either a pair of siblings or a set of a parent and a sibling.
* Ignore self-closing endings on tags (such as form) which are containers. Treat them like open parent tags. Reported by Jonathan Nichols on the hpricot list.
* Escaping of attributes, yanked from Jim Weirich and Sam Ruby's work in Builder.
* Element#raw_attributes gives unescaped data. Element#attributes gives escaped.
* Added: Elements#attr, Elements#remove_attr, Elements#remove_class.
* Added: Traverse#preceding, Traverse#following, Traverse#previous, Traverse#next.
Okay, good enough,
_why
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
[ANN] Hpricot 0.6 -- the swift, delightful HTML parser
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password