[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[REXML] Raw Elements

tsawyer

1/21/2005 11:09:00 PM

I'm having trouble getting elements to be raw. I use, for example:

> d = Document.new( str, { :raw => [ 'O', 'T', 'V' ] } )

Then when I traverse the document and query Element#raw it does say
'true' for these tags, but it still appears that they have been parsed
and I can't get the raw text.

> e.get_text.value

Returns the same thing as

> e.text

Is there another way ones supposed to use to get at the raw text?
Thanks,
T.

12 Answers

tsawyer

1/22/2005 12:52:00 AM

0

Sigh, I just realized I miss understood what raw meant --it's just
relates to entity parsing. Why am I getting the feeling that there is
no way to prevent parsing of the body of an element? I pray this is not
the case, b/c it means back to the drawing board for something like the
13th time :-(. But if is the case, can anyone recommend another XML
parser then can do this?

Thanks,
T.

Aria Stewart

1/22/2005 1:01:00 AM

0

On Sat, 2005-01-22 at 09:55 +0900, trans. wrote:
> Sigh, I just realized I miss understood what raw meant --it's just
> relates to entity parsing. Why am I getting the feeling that there is
> no way to prevent parsing of the body of an element? I pray this is not
> the case, b/c it means back to the drawing board for something like the
> 13th time :-(. But if is the case, can anyone recommend another XML
> parser then can do this?

None that I know of. The problem is this: where do you continue from,
and how do you know if not by parsing?

Ari

----
Ruby web hosting? http://theinternetco.net/o...



tsawyer

1/22/2005 1:29:00 AM

0

Well, I just want to specify a tag and anything in that tag would be
left verbatim. That's all really. I'm tryng to find info on libxml
bindings (rather difficult to find it seems) though I have a feeling
that won't work either.

Worse comes to worse I'll wipe out old trust Tagiter and see if that
will do. Otherwise I'll have to roll my own. Just what I need More
Work!

Thanks Ari.

Zach Dennis

1/22/2005 1:44:00 AM

0

trans. wrote:
> Well, I just want to specify a tag and anything in that tag would be
> left verbatim. That's all really. I'm tryng to find info on libxml
> bindings (rather difficult to find it seems) though I have a feeling
> that won't work either.
>
> Worse comes to worse I'll wipe out old trust Tagiter and see if that
> will do. Otherwise I'll have to roll my own. Just what I need More
> Work!
>

REXML should be pretty easy to manipulate or add functions to. Why roll
your own when you can just add a new behavior?

Zach


tsawyer

1/22/2005 1:53:00 AM

0

Thanks Zach, that's a fair idea. I did a little REXML hacking a few
years back, so maybe so....

"Sean, do you still frequent this list?" Is it reasonably feasible?

T.

William James

1/22/2005 6:21:00 AM

0

trans. wrote:
> Well, I just want to specify a tag and anything in that tag would be
> left verbatim. That's all really. I'm tryng to find info on libxml
> bindings (rather difficult to find it seems) though I have a feeling
> that won't work either.
>
> Worse comes to worse I'll wipe out old trust Tagiter and see if that
> will do. Otherwise I'll have to roll my own. Just what I need More
> Work!
>
> Thanks Ari.


Here's a micro xml-parser (posted via Google, so the
indentation has been removed):


# Produces array of nonmatching and matching
# substrings. The size of the array will
# always be an odd number. The first and the
# last item will always be nonmatching.
def shatter( s, re )
s.gsub( re, "\1"+'\&'+"\1" ).split("\1")
end

def get_attr( s )
h = Hash.new
while s =~ /(\w+)="([^"]*)"/
h[$1] = $2
s = $'
end
h
end

def tag_name( s )
if ( s =~ /^<(\S+)(\s|>)/ )
$1
else
nil
end
end

s = ''
$<.each_line {|x| s=s+x}
all = shatter( s, /<[^>]*>/ )
all.each {|x|
x.chomp!
if x.size > 0
print x
tname = tag_name(x)
print " | " + tname if tname
print "\n"
attr = get_attr( x )
if attr.size > 0
attr.each_pair {|key,val| puts "....#{key}-->#{val}" }
end
end
}


With this input:

<?xml version="1.0" encoding="UTF-8"?>
<tv><programme start="20041218204000 +1000"
stop="20041218225000+1000" channel="Network TEN Brisbane">
<title>The Frighteners</title>
<sub-title/><desc>A psychic private detective, who
consorts with deceased souls, becomes engaged in a mystery as members
of the town community begin dying mysteriously.</desc>
<rating system="ABA"><value>M</value></rating><length
units="minutes">130</length><category>Horror</category></programme>

the output is:

<?xml version="1.0" encoding="UTF-8"?> | ?xml
.....encoding-->UTF-8
.....version-->1.0
<tv> | tv
<programme start="20041218204000 +1000"
stop="20041218225000+1000" channel="Network TEN Brisbane"> | programme
.....stop-->20041218225000+1000
.....start-->20041218204000 +1000
.....channel-->Network TEN Brisbane
<title> | title
The Frighteners
</title> | /title
<sub-title/> | sub-title/
<desc> | desc
A psychic private detective, who
consorts with deceased souls, becomes engaged in a mystery as members
of the town community begin dying mysteriously.
</desc> | /desc
<rating system="ABA"> | rating
.....system-->ABA
<value> | value
M
</value> | /value
</rating> | /rating
<length
units="minutes"> | length
.....units-->minutes
130
</length> | /length
<category> | category
Horror
</category> | /category
</programme> | /programme

tsawyer

1/22/2005 2:54:00 PM

0

Hey Thanks! Not sure if I'll end up using since I just spent last night
wrting a general purpose stack-based parser. But I'll keep it in
reference.

Love the method name #shatter, BTW.

T.

P.S. FYI, I figured out that you can just use a "margin" character in
order to preserve indention. For example, I'm using Google Groups now
too:

: class A
: def shatter
: # ...
: end
: end

As to which character you like best, that's your call ;-).

Also, I know there is a way to set the google group to a fixed-font
mode (I manage a group and there is that option), but I don't know who
manages this group and thus would be able to set it.

Robert Klemme

1/22/2005 3:32:00 PM

0


"trans." <tsawyer@gmail.com> schrieb im Newsbeitrag
news:1106405630.304418.141620@z14g2000cwz.googlegroups.com...
> Hey Thanks! Not sure if I'll end up using since I just spent last night
> wrting a general purpose stack-based parser. But I'll keep it in
> reference.
>
> Love the method name #shatter, BTW.
>
> T.
>
> P.S. FYI, I figured out that you can just use a "margin" character in
> order to preserve indention. For example, I'm using Google Groups now
> too:
>
> : class A
> : def shatter
> : # ...
> : end
> : end
>
> As to which character you like best, that's your call ;-).

I'd like best a space.

Oh, I'm sorry, just got my silly five minutes. :-)

robert

tsawyer

1/22/2005 5:19:00 PM

0

> I'd like best a space.

Me too, but what you gonna do?

Also, btw, I should have mention that the small size of your parser is
impressive --micro indeed!

T.

William James

1/22/2005 5:27:00 PM

0


trans. wrote:
> P.S. FYI, I figured out that you can just use a "margin" character in
> order to preserve indention.


Thanks; I didn't think of that.

.. def shatter( s, re )
.. s.gsub( re, "\1"+'\&'+"\1" ).split("\1")
.. end