[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Hpricot innerTEXT?

Bontina Chen

4/13/2007 8:11:00 AM



Hi


I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn...
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn<...
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
<rdf:Bag>
<rdf:li resource="http://del.icio.us/tag/impo... />
<rdf:li resource="http://del.icio.us/tag/... />
<rdf:li resource="http://del.icio.us/tag/webpagede... />
</rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted via http://www.ruby-....

9 Answers

Lionel

4/13/2007 9:55:00 AM

0

On Apr 13, 10:11 am, Bontina Chen <abonc...@gmail.com> wrote:
> Hi
>
> I'm using hpricot to parse the following file.
>
> <item
> rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn...
> <title>[from morwyn] * HTML for the Conceptually Challenged</title>
> <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn<...
> <description>HTML for the Conceptually Challenged. Very basic tutorial,
> plainly worded for people who hate to read instructions.</description>
> <dc:creator>morwyn</dc:creator>
> <dc:date>2006-10-10T07:28:28Z</dc:date>
> <dc:subject>html imported webpagedesign</dc:subject>
> <taxo:topics>
> <rdf:Bag>
> <rdf:li resource="http://del.icio.us/tag/impo... />
> <rdf:li resource="http://del.icio.us/tag/... />
> <rdf:li resource="http://del.icio.us/tag/webpagede... />
> </rdf:Bag>
> </taxo:topics>
> </item>
>
> I'm trying to get the content from <dc:subject> like this
>
> doc = Hpricot.parse(File.read("965.xhtml"))
>
> (doc/"item").each do |t|
>
> puts (t/"dc:subject").innerTEXT
>
> end
>
> but I got
>
> <dc:subject>html internet tutorial web</dc:subject>
>
> while I only need "html internet tutorial web"
>
> Anyone knows what's the right function to call?
>
> THanks
>
> --
> Posted viahttp://www.ruby-....

replace innerTEXT by inner_html:

(doc/"item").each do |t|
puts (t/"dc:subject").inner_html
end

regards
Lionel

Bontina Chen

4/13/2007 10:10:00 AM

0

Lionel Orry wrote:
> On Apr 13, 10:11 am, Bontina Chen <abonc...@gmail.com> wrote:
>> <dc:creator>morwyn</dc:creator>
>>
>> but I got
>> Posted viahttp://www.ruby-....
> replace innerTEXT by inner_html:
>
> (doc/"item").each do |t|
> puts (t/"dc:subject").inner_html
> end
>
> regards
> Lionel

Thx for your response , but I still get
<dc:subject>html internet tutorial web</dc:subject>


--
Posted via http://www.ruby-....

Lionel

4/13/2007 11:43:00 AM

0

On Apr 13, 12:10 pm, Bontina Chen <abonc...@gmail.com> wrote:
> Lionel Orry wrote:
> > On Apr 13, 10:11 am, Bontina Chen <abonc...@gmail.com> wrote:
> >> <dc:creator>morwyn</dc:creator>
>
> >> but I got
> >> Posted viahttp://www.ruby-....
> > replace innerTEXT by inner_html:
>
> > (doc/"item").each do |t|
> > puts (t/"dc:subject").inner_html
> > end
>
> > regards
> > Lionel
>
> Thx for your response , but I still get
> <dc:subject>html internet tutorial web</dc:subject>
>
> --
> Posted viahttp://www.ruby-....

In fact, inner_text works as well. But you should have a look at the
warnings from ruby! The inner_text or inner_html function is applied
to 'puts (t/"dc:subject")' return object, which is nil.
So a warning appears:
rdf.rb:6: undefined method `inner_html' for nil:NilClass
(NoMethodError)

but 'puts (t/"dc:subject")' is executed, and so '<dc:subject>html
internet tutorial web</dc:subject>' is displayed anyway. Therefore I
recommend using a few parentheses there:

puts((t/"dc:subject").inner_text)

and it should work well this time.

Next time, look at the warnings!!! ;)

regards
Lionel

Brian Candler

4/13/2007 11:54:00 AM

0

On Fri, Apr 13, 2007 at 08:45:08PM +0900, chickenkiller wrote:
> On Apr 13, 12:10 pm, Bontina Chen <abonc...@gmail.com> wrote:
> > Lionel Orry wrote:
> > > On Apr 13, 10:11 am, Bontina Chen <abonc...@gmail.com> wrote:
> > >> <dc:creator>morwyn</dc:creator>
> >
> > >> but I got
> > >> Posted viahttp://www.ruby-....
> > > replace innerTEXT by inner_html:
> >
> > > (doc/"item").each do |t|
> > > puts (t/"dc:subject").inner_html
> > > end
> >
> > > regards
> > > Lionel
> >
> > Thx for your response , but I still get
> > <dc:subject>html internet tutorial web</dc:subject>
> >
> > --
> > Posted viahttp://www.ruby-....
>
> In fact, inner_text works as well. But you should have a look at the
> warnings from ruby! The inner_text or inner_html function is applied
> to 'puts (t/"dc:subject")' return object, which is nil.
> So a warning appears:
> rdf.rb:6: undefined method `inner_html' for nil:NilClass
> (NoMethodError)

That's not a warning, that's an exception, and the program will terminate at
that point. The OP didn't mention any errors.

> but 'puts (t/"dc:subject")' is executed, and so '<dc:subject>html
> internet tutorial web</dc:subject>' is displayed anyway. Therefore I
> recommend using a few parentheses there:
>
> puts((t/"dc:subject").inner_text)
>
> and it should work well this time.
>
> Next time, look at the warnings!!! ;)

Good point, but it was OK the way he wrote it, with a space after puts.

irb(main):003:0> p (1+3).to_s
"4"
=> nil
irb(main):004:0> p(1+3).to_s
4
=> ""

In the first case, this is p( (1+3).to_s )

In the second case, this is ( p(1+3) ).to_s # i.e. nil.to_s

Lionel

4/13/2007 1:38:00 PM

0

On Apr 13, 1:53 pm, Brian Candler <B.Cand...@pobox.com> wrote:
> On Fri, Apr 13, 2007 at 08:45:08PM +0900, chickenkiller wrote:
> > On Apr 13, 12:10 pm, Bontina Chen <abonc...@gmail.com> wrote:
> > > Lionel Orry wrote:
> > > > On Apr 13, 10:11 am, Bontina Chen <abonc...@gmail.com> wrote:
> > > >> <dc:creator>morwyn</dc:creator>
>
> > > >> but I got
> > > >> Posted viahttp://www.ruby-....
> > > > replace innerTEXT by inner_html:
>
> > > > (doc/"item").each do |t|
> > > > puts (t/"dc:subject").inner_html
> > > > end
>
> > > > regards
> > > > Lionel
>
> > > Thx for your response , but I still get
> > > <dc:subject>html internet tutorial web</dc:subject>
>
> > > --
> > > Posted viahttp://www.ruby-....
>
> > In fact, inner_text works as well. But you should have a look at the
> > warnings from ruby! The inner_text or inner_html function is applied
> > to 'puts (t/"dc:subject")' return object, which is nil.
> > So a warning appears:
> > rdf.rb:6: undefined method `inner_html' for nil:NilClass
> > (NoMethodError)
>
> That's not a warning, that's an exception, and the program will terminate at
> that point. The OP didn't mention any errors.

Indeed I use the term 'warning' VERY abusively - I apologize for this.
This is an exception and nothing else.

>
> > but 'puts (t/"dc:subject")' is executed, and so '<dc:subject>html
> > internet tutorial web</dc:subject>' is displayed anyway. Therefore I
> > recommend using a few parentheses there:
>
> > puts((t/"dc:subject").inner_text)
>
> > and it should work well this time.
>
> > Next time, look at the warnings!!! ;)
>
> Good point, but it was OK the way he wrote it, with a space after puts.
>
> irb(main):003:0> p (1+3).to_s
> "4"
> => nil
> irb(main):004:0> p(1+3).to_s
> 4
> => ""
>
> In the first case, this is p( (1+3).to_s )
>
> In the second case, this is ( p(1+3) ).to_s # i.e. nil.to_s

mmmh... interesting... It seems that the problem arises when in a
block:

# output text in comments...
require 'hpricot'

doc = Hpricot(File.open("rdf.xhtml"))

puts (doc/"item"/"dc:subject").inner_text
# html imported webpagedesign

(doc/"item").each do |t|
puts((t/"dc:subject").inner_text)
end
# html imported webpagedesign

(doc/"item").each do |t|
puts (t/"dc:subject").inner_text
end
# <dc:subject>html imported webpagedesign</dc:subject>
# rdf.rb:12: warning: don't put space before argument parentheses
# rdf.rb:12: undefined method `inner_text' for nil:NilClass
(NoMethodError)
# from rdf.rb:11:in `each'
# from rdf.rb:11

I am wondering where the difference is between the two last blocks.
Any ideas?

Lionel

Brian Candler

4/13/2007 1:49:00 PM

0

On Fri, Apr 13, 2007 at 10:40:05PM +0900, chickenkiller wrote:
> doc = Hpricot(File.open("rdf.xhtml"))
>
> puts (doc/"item"/"dc:subject").inner_text
> # html imported webpagedesign
>
> (doc/"item").each do |t|
> puts((t/"dc:subject").inner_text)
> end
> # html imported webpagedesign
>
> (doc/"item").each do |t|
> puts (t/"dc:subject").inner_text
> end
> # <dc:subject>html imported webpagedesign</dc:subject>
> # rdf.rb:12: warning: don't put space before argument parentheses
> # rdf.rb:12: undefined method `inner_text' for nil:NilClass
> (NoMethodError)
> # from rdf.rb:11:in `each'
> # from rdf.rb:11
>
> I am wondering where the difference is between the two last blocks.
> Any ideas?

Hmm, looks like this should be something that can be replicated without
hpricot.

$ cat x.rb
x = 3
puts (x-5).abs

1.times do
puts (x-5).abs
end
$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]
$ ruby x.rb
x.rb:5: warning: don't put space before argument parentheses
2
-2
x.rb:5: undefined method `abs' for nil:NilClass (NoMethodError)
from x.rb:4
$

Congratulations, I think you've found a bug in the parser :-) I'll post this
example to ruby-core.

Regards,

Brian.

Lionel

4/13/2007 1:57:00 PM

0

On Apr 13, 3:48 pm, Brian Candler <B.Cand...@pobox.com> wrote:
> On Fri, Apr 13, 2007 at 10:40:05PM +0900, chickenkiller wrote:
> > doc = Hpricot(File.open("rdf.xhtml"))
>
> > puts (doc/"item"/"dc:subject").inner_text
> > # html imported webpagedesign
>
> > (doc/"item").each do |t|
> > puts((t/"dc:subject").inner_text)
> > end
> > # html imported webpagedesign
>
> > (doc/"item").each do |t|
> > puts (t/"dc:subject").inner_text
> > end
> > # <dc:subject>html imported webpagedesign</dc:subject>
> > # rdf.rb:12: warning: don't put space before argument parentheses
> > # rdf.rb:12: undefined method `inner_text' for nil:NilClass
> > (NoMethodError)
> > # from rdf.rb:11:in `each'
> > # from rdf.rb:11
>
> > I am wondering where the difference is between the two last blocks.
> > Any ideas?
>
> Hmm, looks like this should be something that can be replicated without
> hpricot.
>
> $ cat x.rb
> x = 3
> puts (x-5).abs
>
> 1.times do
> puts (x-5).abs
> end
> $ ruby -v
> ruby 1.8.4 (2005-12-24) [i486-linux]
> $ ruby x.rb
> x.rb:5: warning: don't put space before argument parentheses
> 2
> -2
> x.rb:5: undefined method `abs' for nil:NilClass (NoMethodError)
> from x.rb:4
> $
>
> Congratulations, I think you've found a bug in the parser :-) I'll post this
> example to ruby-core.
>
> Regards,
>
> Brian.

Thanks for your help. I have the same output with this version:

ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32]

regards,
Lionel

John Joyce

4/14/2007 12:13:00 AM

0


On Apr 13, 2007, at 10:48 PM, Brian Candler wrote:

> On Fri, Apr 13, 2007 at 10:40:05PM +0900, chickenkiller wrote:
>> doc = Hpricot(File.open("rdf.xhtml"))
>>
>> puts (doc/"item"/"dc:subject").inner_text
>> # html imported webpagedesign
>>
>> (doc/"item").each do |t|
>> puts((t/"dc:subject").inner_text)
>> end
>> # html imported webpagedesign
>>
>> (doc/"item").each do |t|
>> puts (t/"dc:subject").inner_text
>> end
>> # <dc:subject>html imported webpagedesign</dc:subject>
>> # rdf.rb:12: warning: don't put space before argument parentheses
>> # rdf.rb:12: undefined method `inner_text' for nil:NilClass
>> (NoMethodError)
>> # from rdf.rb:11:in `each'
>> # from rdf.rb:11
>>
>> I am wondering where the difference is between the two last blocks.
>> Any ideas?
>
> Hmm, looks like this should be something that can be replicated
> without
> hpricot.
>
> $ cat x.rb
> x = 3
> puts (x-5).abs
>
> 1.times do
> puts (x-5).abs
> end
> $ ruby -v
> ruby 1.8.4 (2005-12-24) [i486-linux]
> $ ruby x.rb
> x.rb:5: warning: don't put space before argument parentheses
> 2
> -2
> x.rb:5: undefined method `abs' for nil:NilClass (NoMethodError)
> from x.rb:4
> $
>
> Congratulations, I think you've found a bug in the parser :-) I'll
> post this
> example to ruby-core.
>
> Regards,
>
> Brian.
>
Inside the do-end or {} block, use this:
puts((x - 5).abs)
It is more explicit, but correct and works.

so,
>>>> (doc/"item").each do |t|
>>>> puts (t/"dc:subject").inner_html
>>>> end
>>>>
will work as
(doc/"item").each do |t|
puts((t/"dc:subject").inner_html
end

Florian Gilcher

4/14/2007 9:49:00 AM

0

I prefer this version for the initial problem:

irb(main):045:0> elements = doc.search('dc:subject/text()')
=> #<Hpricot::Elements["html imported webpagedesign"]>

irb(main):048:0> elements.first.to_s
=> "html imported webpagedesign"
irb(main):049:0> elements.first.parent
=> {elem <dc:subject> "html imported webpagedesign" </dc:subject>}


--
Posted via http://www.ruby-....