[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

searching an XML doc

jrrtolkienfan@gmail.com

1/15/2008 8:34:00 PM

Hello,

I've been reading about ElementTreee and ElementPath so I could use
them to find the right elements in the DOM. Unfortunately neither of
these seem to offer XPath like capabilities where I can find elements
based on tag, attribute values etc. Are there any libraries which can
give me XPath like functionality?

Thanks in advance
5 Answers

Diez B. Roggisch

1/15/2008 9:50:00 PM

0

Gowri schrieb:
> Hello,
>
> I've been reading about ElementTreee and ElementPath so I could use
> them to find the right elements in the DOM. Unfortunately neither of
> these seem to offer XPath like capabilities where I can find elements
> based on tag, attribute values etc. Are there any libraries which can
> give me XPath like functionality?


lxml does that.

Diez

jrrtolkienfan@gmail.com

1/16/2008 12:47:00 AM

0

On Jan 15, 3:49 pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> Gowri schrieb:
>
> > Hello,
>
> > I've been reading about ElementTreee and ElementPath so I could use
> > them to find the right elements in the DOM. Unfortunately neither of
> > these seem to offer XPath like capabilities where I can find elements
> > based on tag, attribute values etc. Are there any libraries which can
> > give me XPath like functionality?
>
> lxml does that.
>
> Diez

Hi Diez

I was trying lxml out and was unable to find any examples that would
help me parse an XML file with namespaces. For example, my XML file
looks like this:

<phedexData xmlns="http://a.b.com/ph...
xmlns:xsi="http://www.w3.org/2001/XMLSchema-inst...
xsi:schemaLocation="http://a.b.... requests.xsd">
<!-- Low priority replication request -->
<request id="1234" last_update="1060199000.0">
<status>
<approved>T1_RAL_MSS</approved>
<approved>T2_London_ICHEP</approved>
<disapproved>T2_Southgrid_Bristol</disapproved>
<pending/>
<move_pending/>
</status>
<subscription open="1" priority="0" type="replicate">
<items>
<dataset>/PrimaryDS1/ProcessedDS1/Tier</dataset>
<block>/PrimaryDS2/ProcessedDS2/Tier/block</block>
</items>
</subscription>
</request>
</phedexData>

If my Xpath query is //request, it obviously would not work. Is there
some sort of namespace registration etc. that is to be done before
issuing a query? Example code would help a lot.


G F

1/16/2008 10:21:00 AM

0

On Jan 15, 9:33 pm, Gowri <gowr...@gmail.com> wrote:
> Hello,
>
> I've been reading about ElementTreee and ElementPath so I could use
> them to find the right elements in the DOM. Unfortunately neither of
> these seem to offer XPath like capabilities where I can find elements
> based on tag, attribute values etc. Are there any libraries which can
> give me XPath like functionality?
>
> Thanks in advance

Create your query like:

ns0 = '{http://a.b....}'

query = '%srequest/%sstatus' % (ns0, ns0)

Also, although imperfect, some people have found this useful:

http://gflanagan.net/site/python/utils/elem...elementfil...

[CODE]

test = '''<phedexData xmlns="http://a.b...."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-inst...
xsi:schemaLocation="http://a.b.... requests.xsd">
<!-- Low priority replication request -->
<request id="1234" last_update="1060199000.0">
<status>
<approved>T1_RAL_MSS</approved>
<approved>T2_London_ICHEP</approved>
<disapproved>T2_Southgrid_Bristol</
disapproved>
<pending/>
<move_pending/>
</status>
<subscription open="1" priority="0" type="replicate">
<items>
<dataset>/PrimaryDS1/ProcessedDS1/
Tier</dataset>
<block>/PrimaryDS2/
ProcessedDS2/Tier/block</block>
</items>
</subscription>
</request>
</phedexData>
'''

from xml.etree import ElementTree as ET

root = ET.fromstring(test)

ns0 = '{http://a.b....}'

from rattlebag.elementfilter import findall, data

#http://gflanagan.net/site/python/utils/elem...
elementfilter.py.txt

query0 = '%(ns)srequest/%(ns)sstatus' % {'ns': ns0}
query1 = '%(ns)srequest/%(ns)ssubscription[@type=="replicate"]/%
(ns)sitems' % {'ns': ns0}
query2 = '%(ns)srequest[@id==1234]/%(ns)sstatus/%(ns)sapproved' %
{'ns': ns0}

print 'With ElementPath: '
print root.findall(query0)
print
print 'With ElementFilter:'
for query in [query0, query1, query2]:
print
print '+'*50
print 'query: ', query
print
for item in findall(root, query):
print 'item: ', item
print 'xml:'
ET.dump(item)

print '-'*50
print
print 'approved: ', data(root, query2)

[/CODE]

[OUTPUT]
With ElementPath:
[<Element {http://a.b....}status at b95ad0>]

With ElementFilter:

++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b....}request/{http://a.b....}status

item: <Element {http://a.b....}status at b95ad0>
xml:
<ns0:status xmlns:ns0="http://a.b....">
<ns0:approved>T1_RAL_MSS</ns0:approved>
<ns0:approved>T2_London_ICHEP</ns0:approved>
<ns0:disapproved>T2_Southgrid_Bristol</
ns0:disapproved>
<ns0:pending />
<ns0:move_pending />
</ns0:status>


++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b....}request/{http:...
phedex}subscription[@type
=="replicate"]/{http://a.b....}items

item: <Element {http://a.b....}items at b95eb8>
xml:
<ns0:items xmlns:ns0="http://a.b....">
<ns0:dataset>/PrimaryDS1/ProcessedDS1/
Tier</ns0:
dataset>
<ns0:block>/PrimaryDS2/
ProcessedDS2/Tier
/block</ns0:block>
</ns0:items>


++++++++++++++++++++++++++++++++++++++++++++++++++
query: {http://a.b....}request[@id==1234]/{http:...
phedex}status/
{http://a.b....}approved

item: <Element {http://a.b....}approved at b95cd8>
xml:
<ns0:approved xmlns:ns0="http://a.b....">T1_RAL_MSS</
ns0:approved>

item: <Element {http://a.b....}approved at b95cb0>
xml:
<ns0:approved xmlns:ns0="http://a.b....">T2_London_ICHEP</
ns0:approved>

--------------------------------------------------

approved: ['T1_RAL_MSS', 'T2_London_ICHEP']
INFO End logging.
[/OUTPUT]

jrrtolkienfan@gmail.com

1/16/2008 10:37:00 AM

0

Hi Gerard,

I don't know what to say :) thank you so much for taking time to post
all of this. truly appreciate it :)

Stefan Behnel

1/16/2008 8:13:00 PM

0

grflanagan wrote:
> On Jan 15, 9:33 pm, Gowri <gowr...@gmail.com> wrote:
>> I've been reading about ElementTreee and ElementPath so I could use
>> them to find the right elements in the DOM. Unfortunately neither of
>> these seem to offer XPath like capabilities where I can find elements
>> based on tag, attribute values etc. Are there any libraries which can
>> give me XPath like functionality?
>
> Create your query like:
>
> ns0 = '{http://a.b....}'
>
> query = '%srequest/%sstatus' % (ns0, ns0)

lxml supports the same thing, BTW, and how to work with namespaces is
explained in the tutorial:

http://codespeak.net/lxml/dev/tutorial.html#...

Stefan