[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Using XPath to retrieve an XML element which contains a given text

anne001

8/10/2008 7:37:00 PM

This code returns the first dataformat element.
And yet the second dataformat is the one containing SPPT.
What am I doing wrong?

require "rexml/document"

include REXML

string = <<EOF
<dataformats>
<dataformat>
<fileidentifiers>
<fileidentifier>CFMT</fileidentifier>
</fileidentifiers>
</dataformat>
<dataformat>
<fileidentifiers>
<fileidentifier>SPPT</fileidentifier>
</fileidentifiers>
</dataformat>
</dataformats>
EOF

doc = Document.new string
xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
p XPath.first(doc,xpathquery).to_s
4 Answers

Dejan Dimic

8/10/2008 10:08:00 PM

0

On Aug 10, 9:37 pm, anne001 <a...@wjh.harvard.edu> wrote:
> This code returns the first dataformat element.
> And yet the second dataformat is the one containing SPPT.
> What am I doing wrong?
>
> require "rexml/document"
>
> include REXML
>
> string = <<EOF
>   <dataformats>
>       <dataformat>
>                 <fileidentifiers>
>                         <fileidentifier>CFMT</fileidentifier>
>                 </fileidentifiers>
>         </dataformat>
>       <dataformat>
>                 <fileidentifiers>
>                         <fileidentifier>SPPT</fileidentifier>
>                 </fileidentifiers>
>       </dataformat>
>   </dataformats>
> EOF
>
> doc = Document.new string
> xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
> p XPath.first(doc,xpathquery).to_s

I think you XPath query should be:
xpathquery="//dataformat[contains(., 'SPPT')]"

or more specific one:
xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,'SPPT')]"


anne001

8/10/2008 11:43:00 PM

0

Thank you, the first formulation works.

I had tried the second one on the complete xml file and it does not
work.
Do you have an idea why? Is there a typo I am not seeing?

Here is a test file a little closer to the XML file I am working with

require "rexml/document"
include REXML

string = <<EOF
<dataformats>
<dataformat>
<name>NARSAD recognition</name>
<fileidentifiers>
<fileidentifier>NARSAD</fileidentifier>
</fileidentifiers>
</dataformat>
<dataformat>
<name>SPFT</name>
<fileidentifiers>
<fileidentifier>SPFT</fileidentifier>
<fileidentifier>SPPT</fileidentifier>
</fileidentifiers>
</dataformat>
</dataformats>
EOF

doc = Document.new string

xpathquery="//dataformat[contains(., 'SPPT')]"
p 'yours1'
p XPath.first(doc,xpathquery).to_s

xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,'SPPT')]"
p 'yours2'
p XPath.first(doc,xpathquery).to_s

result
"yours1"
"<dataformat>\n\t\t<name>SPFT</name>\n\t\t<fileidentifiers>\n\t\t
\t<fileidentifier>SPFT</fileidentifier>\n\t\t\t<fileidentifier>SPPT</
fileidentifier>\n\t\t</fileidentifiers>\n\t</dataformat>"
"yours2"
""

Robert Klemme

8/11/2008 3:11:00 PM

0

Hi Anne,

welcome back!

2008/8/11 anne001 <anne@wjh.harvard.edu>:
> Thank you, the first formulation works.
>
> I had tried the second one on the complete xml file and it does not
> work.
> Do you have an idea why? Is there a typo I am not seeing?
>
> Here is a test file a little closer to the XML file I am working with
>
> require "rexml/document"
> include REXML
>
> string = <<EOF
> <dataformats>
> <dataformat>
> <name>NARSAD recognition</name>
> <fileidentifiers>
> <fileidentifier>NARSAD</fileidentifier>
> </fileidentifiers>
> </dataformat>
> <dataformat>
> <name>SPFT</name>
> <fileidentifiers>
> <fileidentifier>SPFT</fileidentifier>
> <fileidentifier>SPPT</fileidentifier>
> </fileidentifiers>
> </dataformat>
> </dataformats>
> EOF
>
> doc = Document.new string
>
> xpathquery="//dataformat[contains(., 'SPPT')]"
> p 'yours1'
> p XPath.first(doc,xpathquery).to_s
>
> xpathquery="//dataformat[contains(fileidentifiers/
> fileidentifier,'SPPT')]"
> p 'yours2'
> p XPath.first(doc,xpathquery).to_s

I believe "contains" is the wrong function as it does a textual
comparison and I have no idea whether a node is actually allowed as
input. I believe the correct XPath expression is this:

"//dataformat[descendant::fileidentifier[text()='SPPT']]"

Here are some expressions that you may want to try:

# find the correct fileidentifier
XPath.each doc, "//fileidentifier[text()='SPPT']" do |elm|
puts elm
end

puts '-------------'

# go upwards from there to find the dataformat node
XPath.each doc, "//fileidentifier[text()='SPPT']/ancestor::dataformat" do |elm|
puts elm
end

puts '-------------'

# select all dataformats that contain a fileidentifier with text "SPPT"
# this seems to best reflect what you want
XPath.each doc,
"//dataformat[descendant::fileidentifier[text()='SPPT']]" do |elm|
puts elm
end

Btw, I have these bookmarked and they serve me well with regard to
XPath issues (I always have to look them up):
http://www.w3schools.com/xpath/d...
http://www.zvon.org/xxl/XPathTutorial/General/exa...

(I use the first one most of the time.)

Kind regards

robert


--
use.inject do |as, often| as.you_can - without end

Robert Klemme

8/11/2008 4:44:00 PM

0

On 11.08.2008 17:11, Robert Klemme wrote:
> I believe "contains" is the wrong function as it does a textual
> comparison and I have no idea whether a node is actually allowed as
> input. I believe the correct XPath expression is this:

Wait, change "correct" to "more appropriate".

> "//dataformat[descendant::fileidentifier[text()='SPPT']]"
>
> Here are some expressions that you may want to try:

Here are even more that yield the result you want (or so I believe):

[
"//dataformat[descendant::fileidentifier[text()='SPPT']]",
"//dataformat[fileidentifiers/fileidentifier[text()='SPPT']]",
"//dataformat[descendant::fileidentifier[contains(text(),'SPPT')]]",
"//dataformat[fileidentifiers/fileidentifier[contains(text(),'SPPT')]]",
"//dataformat[descendant::fileidentifier[starts-with(text(),'SPPT')]]",

"//dataformat[fileidentifiers/fileidentifier[starts-with(text(),'SPPT')]]",
"//dataformat[descendant::fileidentifier[ends-with(text(),'SPPT')]]",
"//dataformat[fileidentifiers/fileidentifier[ends-with(text(),'SPPT')]]",
].each do |xpath|
printf "\nXPath: %p\n\n", xpath

XPath.each doc, xpath do |elm|
puts elm
end
end

Interestingly ends-with() does not seem to work. Maybe we hit a REXML bug.

XPath nicely fits Ruby because of TIMTOWTDI. :-)

Kind regards

robert