Asp Forum - rexml to slow - comp.lang.ruby

Bu Mihai

3/29/2008 7:47:00 AM

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

I would have tried libxml but i had some problems to install it under
windows.
--
Posted via http://www.ruby-....

19 Answers

Mark Ryall

3/29/2008 8:04:00 AM

[Note: parts of this message were removed to make it a legal post.]

have you tried hpricot?

On Sat, Mar 29, 2008 at 6:46 PM, Bu Mihai <mihai.bulhac@yahoo.com> wrote:

> I have an xml file and sometimes i call the find_first_recursive method;
> when my xml file is small its working fine but when i have ~900 lines im
> waiting ~15 seconds to return me the wanted node and i want something
> faster; How can i obtain a better time?
>
> I would have tried libxml but i had some problems to install it under
> windows.
> --
> Posted via http://www.ruby-....
>
>

Bu Mihai

3/29/2008 8:08:00 AM

Mark Ryall wrote:
> have you tried hpricot?

not yet; its faster?
--
Posted via http://www.ruby-....

Phlip

3/29/2008 3:37:00 PM

Bu Mihai wrote:

>> have you tried hpricot?
>
> not yet; its faster?

In general, yes. REXML stands for "Regular Expressions XML", and Regexps are
very slow when you abuse them. Even a simple parser written in pure Ruby
would have been faster.

Hpricot trades a weaker parser for code optimized with C.

It is also more forgiving; sometimes crappy HTML needs that.

http://www.oreillynet.com/onlamp/blog/2007/08/assert_hpri...

--
Phlip

Robert Klemme

3/30/2008 10:55:00 AM

On 29.03.2008 08:46, Bu Mihai wrote:
> I have an xml file and sometimes i call the find_first_recursive method;
> when my xml file is small its working fine but when i have ~900 lines im
> waiting ~15 seconds to return me the wanted node and i want something
> faster; How can i obtain a better time?

What's the find criteria you use? Maybe you can use XPath. 900 lines
does not really sound large so I suspect there might be an algorithmic
or design error.

Kind regards

robert

Bu Mihai

3/30/2008 6:23:00 PM

Robert Klemme wrote:
> On 29.03.2008 08:46, Bu Mihai wrote:
>> I have an xml file and sometimes i call the find_first_recursive method;
>> when my xml file is small its working fine but when i have ~900 lines im
>> waiting ~15 seconds to return me the wanted node and i want something
>> faster; How can i obtain a better time?
>
> What's the find criteria you use? Maybe you can use XPath. 900 lines
> does not really sound large so I suspect there might be an algorithmic
> or design error.
>
> Kind regards
>
> robert

this is the criteria:

node=rexml_element.find_first_recursive {|node|
node.attributes["again"]=="yes"}
--
Posted via http://www.ruby-....

Robert Klemme

3/30/2008 6:49:00 PM

On 30.03.2008 20:23, Bu Mihai wrote:
> Robert Klemme wrote:
>> On 29.03.2008 08:46, Bu Mihai wrote:
>>> I have an xml file and sometimes i call the find_first_recursive method;
>>> when my xml file is small its working fine but when i have ~900 lines im
>>> waiting ~15 seconds to return me the wanted node and i want something
>>> faster; How can i obtain a better time?
>> What's the find criteria you use? Maybe you can use XPath. 900 lines
>> does not really sound large so I suspect there might be an algorithmic
>> or design error.
>
> this is the criteria:
>
> node=rexml_element.find_first_recursive {|node|
> node.attributes["again"]=="yes"}

That's easy

doc.elements.each('//[@again="yes"]') do |node|
# any node that has attribute again with value yes
end

And I am pretty sure that this is faster than your approach. What does
your program do? With more context we can come up with further suggestions.

Kind regards

robert

Bu Mihai

3/30/2008 8:22:00 PM

Robert Klemme wrote:
> On 30.03.2008 20:23, Bu Mihai wrote:
>> this is the criteria:
>>
>> node=rexml_element.find_first_recursive {|node|
>> node.attributes["again"]=="yes"}
>
> That's easy
>
> doc.elements.each('//[@again="yes"]') do |node|
> # any node that has attribute again with value yes
> end
>
> And I am pretty sure that this is faster than your approach. What does
> your program do? With more context we can come up with further
> suggestions.
>
> Kind regards
>
> robert

I'm not sure if that will works, i have a xml file with this
structure(and it must be like this, the following example is a simple
sample of the original):
<root>
<new_section>
<pages>
<page again="yes">page1</page>
<page again="no">page2</page>
<page againe=yes"">page3
<pages>
<page again="no">page4<page>
<page again="yes">
<pages>....and so on
</pages>
</page>

</pages>
</new_section>
<new_section>
</root>

I have a recursive function to find all 'page' nodes with attribute
'again' 'yes but i need to start the searc from the beging of the file
or from the current node and the display all subnodes with 'yes'; after
the all nodes was founded then i need to search them again from the
begining of the file; its something like this:

def find(xml_file)
node=xml_file.find_first_recursive {|node|
node.attributes["again"]=="yes"}
if not(node==nil)
then
puts node.text
find(xml_file.elements[node])
else
find(xml_file.elements["//"])
end
end

In this example the find function is an endless loop, somewhere i must
put a return, but i need something like that and when my file is big
(~900) i wait ~10 seconds for the command (but not always - only when
i'm starting to search from the beging of the file):
node=xml_file.find_first_recursive {|node|
node.attributes["again"]=="yes"}

Many thanks for your help Robert.
--
Posted via http://www.ruby-....

Robert Klemme

3/30/2008 9:45:00 PM

On 30.03.2008 22:21, Bu Mihai wrote:
> Robert Klemme wrote:
>> On 30.03.2008 20:23, Bu Mihai wrote:
>>> this is the criteria:
>>>
>>> node=rexml_element.find_first_recursive {|node|
>>> node.attributes["again"]=="yes"}
>> That's easy
>>
>> doc.elements.each('//[@again="yes"]') do |node|
>> # any node that has attribute again with value yes
>> end
>>
>> And I am pretty sure that this is faster than your approach. What does
>> your program do? With more context we can come up with further
>> suggestions.
>
> I'm not sure if that will works, i have a xml file with this
> structure(and it must be like this, the following example is a simple
> sample of the original):
> <root>
> <new_section>
> <pages>
> <page again="yes">page1</page>
> <page again="no">page2</page>
> <page againe=yes"">page3
> <pages>
> <page again="no">page4<page>
> <page again="yes">
> <pages>....and so on
> </pages>
> </page>
>
> </pages>
> </new_section>
> <new_section>
> </root>
>
> I have a recursive function to find all 'page' nodes with attribute
> 'again' 'yes but i need to start the searc from the beging of the file
> or from the current node and the display all subnodes with 'yes';

You can use the XPath from the root and I believe also from a
particulara node.

> after
> the all nodes was founded then i need to search them again from the
> begining of the file;

When I asked what your program does, I really meant: Can you explain in
non technical words what this program is supposed to do? Since you seem
to traverse over the same nodes over and over again I have the strong
feeling that there is a better alternative - but for that we need to
know the purpose of the program.

> Many thanks for your help Robert.

You're welcome.

Kind regards

robert

Bu Mihai

3/31/2008 8:41:00 AM

Im trying to build a map and to memorize all routes. I have a root node
wich will generate some roads and each road will generate another roads
and i have to go on all roads until there is no road unchecked.
If im on a road and that road generates new roads then to go an all
generated road i must begin my route from the begining not from the road
who generates his child roads.

<root>
<roads>
<road testit="yes" again="yes" duplicate_road="no" >road1</road>
<roads>
<road testit="no" again="yes"
duplicate_road="no">road3</road>
</roads>
<road testit="no" again="yes" duplicate_road="no">road2</road>
<roads>
<road testit="no" again="no"
duplicate_road="yes">road3</road>
</roads>
</roads>
</root>

I have the root node who generate two roads: road1 and road2 and i must
verify this roads and check if each road will generate new roads; if yes
then i must set "again=yes" because that road has "childs" who must be
checked. So for example road1 generate road3 but to get to road3 i must
go to root->road1->road3 and so on... (if road3 generates 3 another
roads to go on one road i must have root->road1->road3->road3_1 or
road3_2 or road3_3)

Also i must have a attribute duplicate_road; for example if road2
generates also road3 then i will compare all checked roads till that
moment and if it is found then that means it is a duplicate road so i
mustnt check if again (again=no)

And so i can generate in xml a map with roads (for the moment i dont
care which path is shorter only to find a path from the root to the
road_x based on the xml map).

Tnx.

--
Posted via http://www.ruby-....

Robert Klemme

3/31/2008 9:22:00 AM

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:
> Im trying to build a map and to memorize all routes. I have a root node
> wich will generate some roads and each road will generate another roads
> and i have to go on all roads until there is no road unchecked.
> If im on a road and that road generates new roads then to go an all
> generated road i must begin my route from the begining not from the road
> who generates his child roads.

Ok, a pretty straightforward graph problem. It is a bad idea to do
that on the raw XML data. You should create a representation of the
road data that suits your algorithm better. Then read the whole XML
only once, create that representation and implement your algorithm on
your internal representation. Doing it on the XML is certainly the
worst option.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

comp.lang.ruby

rexml to slow

Bu Mihai

Mark Ryall

Bu Mihai

Phlip

Robert Klemme

Bu Mihai

Robert Klemme

Bu Mihai

Robert Klemme

Bu Mihai

Robert Klemme

x Login to ForumsZone