[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

A simple Hpricot text setter

Chris Gehlker

8/10/2006 6:19:00 PM

If anyone is trying to use Hpricot to clean up the actual content of
a site while leaving the markup alone, theymight find the following
tiny method useful:

class Hpricot::Text
# Adds a simple Hpricot method to change
# the text embedded in an HTML document
#
# Example of use:
# body.traverse_text do |text|
# text_out = text.to_s
# manupulate text_out
# text.set(text_out)
# end
def set(string)
@content = string
self.raw_string = string
end
end

The trick is to set both @content in Hpricot::Text and @raw_string in
it's parent.
--
The folly of mistaking a paradox for a discovery, a metaphor for a
proof, a torrent of verbiage for a spring of capital truths, and
oneself for an oracle, is inborn in us.
-Paul Valery, poet and philosopher (1871-1945)



6 Answers

why the lucky stiff

8/12/2006 12:20:00 AM

0

On Fri, Aug 11, 2006 at 03:19:13AM +0900, Chris Gehlker wrote:
> If anyone is trying to use Hpricot to clean up the actual content of
> a site while leaving the markup alone, theymight find the following
> tiny method useful:
>
> class Hpricot::Text
> # Adds a simple Hpricot method to change
> # the text embedded in an HTML document
> #
> # Example of use:
> # body.traverse_text do |text|
> # text_out = text.to_s
> # manupulate text_out
> # text.set(text_out)
> # end
> def set(string)
> @content = string
> self.raw_string = string
> end
> end

You can also use Elements#inner_html= and Element#inner_html= for this.

(body/:a).inner_html = "New Link Text"

Also: set, html, remove, append, prepend, before, after, and wrap, which all
work just like their JQuery cousins.[1]

Thankyou for using Hpricot, it helps the all horses' hearts when you do.

_why

[1] http://jquery.com/...

Chris Gehlker

8/12/2006 2:23:00 AM

0


On Aug 11, 2006, at 5:20 PM, why the lucky stiff wrote:

> On Fri, Aug 11, 2006 at 03:19:13AM +0900, Chris Gehlker wrote:
>> If anyone is trying to use Hpricot to clean up the actual content of
>> a site while leaving the markup alone, theymight find the following
>> tiny method useful:
>>
>> class Hpricot::Text
>> # Adds a simple Hpricot method to change
>> # the text embedded in an HTML document
>> #
>> # Example of use:
>> # body.traverse_text do |text|
>> # text_out = text.to_s
>> # manupulate text_out
>> # text.set(text_out)
>> # end
>> def set(string)
>> @content = string
>> self.raw_string = string
>> end
>> end
>
> You can also use Elements#inner_html= and Element#inner_html= for
> this.
>
> (body/:a).inner_html = "New Link Text"
>
> Also: set, html, remove, append, prepend, before, after, and wrap,
> which all
> work just like their JQuery cousins.[1]

Thanks for responding, why: and thanks very much for Hpricot.

I'm a long way from completely understanding Hpricot but I did try to
use inner_html in what I though was the correct way.

Here is a little sample program:

require 'rubygems'
require_gem 'hpricot'

doc = Hpricot(open('TestFile.html'))
body = doc.search('body')
body.each {|elmnt| elmnt.inner_html}
body.inner_html
(body/:a).inner_html = "New Link Text"
puts doc

The output is:
testHpricot.rb:6: undefined method `inner_html' for #<Hpricot::Elem:
0x7546bc> (NoMethodError)
from testHpricot.rb:6:in `each'
from testHpricot.rb:6

If I comment out the body.each... line I get:

testHpricot.rb:7: undefined method `inner_html' for
#<Hpricot::Elements:0x753d48> (NoMethodError)

If I comment out that line, I get:

testHpricot.rb:8: undefined method `inner_html=' for []:Array
(NoMethodError)


What may be related is that the file text.rb is at:
/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/text.rb
but it is not actually being required anywhere in Hpricot. When i
tried to require it manually, i found that it was requiring files
that gem didn't give me. This is all in Hpricot 0.3.

Thanks again for both your time and Hpricot.
--
Seven Deadly Sins? I thought it was a to-do list!


why the lucky stiff

8/14/2006 4:29:00 PM

0

On Sat, Aug 12, 2006 at 11:23:14AM +0900, Chris Gehlker wrote:
> What may be related is that the file text.rb is at:
> /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/text.rb
> but it is not actually being required anywhere in Hpricot. When i
> tried to require it manually, i found that it was requiring files
> that gem didn't give me. This is all in Hpricot 0.3.

Okay, yeah, you'll need the latest Hpricot (0.4.43):

gem install hpricot --source code.whytheluckystiff.net

Also, don't forget to remove `require_gem 'hpricot'` and use, instead,
`require 'hpricot'`.

_why

Chris Gehlker

8/16/2006 4:05:00 AM

0


On Aug 14, 2006, at 9:29 AM, why the lucky stiff wrote:

> Okay, yeah, you'll need the latest Hpricot (0.4.43):
>
> gem install hpricot --source code.whytheluckystiff.net
>
> Also, don't forget to remove `require_gem 'hpricot'` and use, instead,
> `require 'hpricot'`.
>
> _why

You seem to be making great progress with Hpricot, committing changes
every day.

Yep, 'require_gem' no longer works. Just using 'require' seems better.

I don't know that I communicated my idea behind adding a set method
for Hpricot::Text. There are times when one wants to scan an
potentially change everything that's *not* markup. The markup should
be left unchanged or modified only in trivial ways such as changing
the order of attribute declarations.

Hpricott::Traverse#traverse_text is great for finding as the stuff
that's *not* markup, the pcdata, in an HTML file. I just added a
method to change that data.

You suggested using inner_html= but the only way I can see that
working is to parse the tree looking for those elements which only
have Hpricot::Text children and then using inner_html= on them. But
that would involve essentially recreating
Hpricott::Traverse#traverse_text to find such elements although the
common code could mostly be factored out.
--
And those who were seen dancing were thought to be insane by those
who could not hear the music.
-Friedrich Wilhelm Nietzsche, philosopher (1844-1900)


why the lucky stiff

8/16/2006 7:25:00 PM

0

On Wed, Aug 16, 2006 at 01:04:46PM +0900, Chris Gehlker wrote:
> Hpricott::Traverse#traverse_text is great for finding as the stuff
> that's *not* markup, the pcdata, in an HTML file. I just added a
> method to change that data.

Okay, I get it. I guess I need to get //div[contains(text(), '...')]
working. Be assured, traverse_text will stick around.

_why

Chris Gehlker

8/16/2006 10:00:00 PM

0


On Aug 16, 2006, at 12:25 PM, why the lucky stiff wrote:

> Okay, I get it. I guess I need to get //div[contains(text(), '...')]
> working.

Works for me!
> Be assured, traverse_text will stick around.

Thanks why!
--
Egotism is the anesthetic that dulls the pain of stupidity.
-Frank William Leahy, football coach (1908-1973)