[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

REXML Optimization

Bucco

6/24/2005 3:53:00 AM

So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
$balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
$balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and the
ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

<debit category="clothes">
<amount>31.19</amount>
<date><year>2002</year><month>1</month><day>3</day></date>
<payto>Walking Store</payto>
<description>shoes</description>
</debit>

<deposit category="salary">
<amount>1549.58</amount>
<date><year>2002</year><month>1</month><day>7</day></date>
<payor>Bob's Bolts</payor>
</deposit>
</checkbook>


#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

Thanks:)

SA

4 Answers

Ryan Leavengood

6/24/2005 4:07:00 AM

0

Bucco said:
>
> Just to help me complete the learning process, I wish to pose the
> question to the group: Is there a better way to do this, and is there
> more optimization I can do to my code?

It looks good to me. In fact, I didn't find the Perl that bad (which is
unusual for me...I'm not a big Perl fan.)

But you seem to have the Ruby "style" down pretty well.

Ryan



james_b

6/24/2005 5:10:00 AM

0

Bucco wrote:
> Just to help me complete the learning process, I wish to pose the
> question to the group: Is there a better way to do this, and is there
> more optimization I can do to my code?


The DOM is not a database, and it shows.

XPath queries can get real slow as the document size grows.

Suggestion: Read and parse the XML once, and store it internally in a
format better suited for queries. XML is great for all sorts of things,
particularly for inter-app data exchange, but once the data is inside
your system that value drops. So, if the code is mainly concerned with
executing queries and such, slurp in the XML and stash it in some
optimized internal structure. Maybe use Madeleine for in-memory storage
and queries.

If need be, add code to serialize the data back to XML for persistence
when the app is shut down.

Try to compute the start-up cost of the parsing and restructuring and
indexing the data right up front, versus the cost of running XPath calls
over and over. See if it gains you anything.



James

--

http://www.ru... - The Ruby Documentation Site
http://www.r... - News, Articles, and Listings for Ruby & XML
http://www.rub... - The Ruby Store for Ruby Stuff
http://www.jame... - Playing with Better Toys


Robert Klemme

6/24/2005 8:00:00 AM

0

Bucco wrote:
> So, I'm reading this document on XML and using XML as a database. Of
> course the author uses this cryptic perl script to parse the xml file:
>
> #!/usr/bin/perl
> use XML::LibXML;
> my $parser = new XML::LibXML;
> my $doc = $parser->parse_file( shift @ARGV );
> my $balance = $doc->findvalue( '/checkbook/@balance-start' );
> foreach my $record ( $doc->findnodes( '//debit' )) {
> $balance -= $record->findvalue( 'amount' );
> }
> foreach my $record ( $doc->findnodes( '//deposit' )) {
> $balance += $record->findvalue( 'amount' );
> }
> print "Current balance: $balance\n";
>
> So, since I was trying to figure out how to use xml as a database and
> how to use REXML I gave the script a whack and tried to write a ruby
> script to do the same thing. Below is a sample of the xml file and
> the ruby script:
>
> <?xml version="1.0"?>
> <checkbook balanceStart="2460.62">
> <title>expenses: january 2002</title>
>
> <debit category="clothes">
> <amount>31.19</amount>
> <date><year>2002</year><month>1</month><day>3</day></date>
> <payto>Walking Store</payto>
> <description>shoes</description>
> </debit>
>
> <deposit category="salary">
> <amount>1549.58</amount>
> <date><year>2002</year><month>1</month><day>7</day></date>
> <payor>Bob's Bolts</payor>
> </deposit>
> </checkbook>
>
>
> #!/usr/bin/ruby -w
> require 'rexml/document'
>
> # Read in XML doc
> doc = REXML::Document.new(File.open('cb.xml'))
> # Future version need to have entry from command line
>
> # Find the balance and assign to float variabl 'balance'
> balance = doc.root.attributes['balanceStart'].to_f
>
> # Calculate debits and balance
> doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
> # Calculate deposits and balance
> doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}
>
> #Display final balance:
> puts balance
>
> Of course I was able to complete the same task as teh perl script in
> ruby with less code. (Not to mention easier to read code)
>
> Just to help me complete the learning process, I wish to pose the
> question to the group: Is there a better way to do this, and is there
> more optimization I can do to my code?

You could get rid of one traversal by iterating all "amounts" and do the
calculation based on the parent element's type.

Kind regards

robert

Robert Klemme

6/24/2005 8:52:00 AM

0

Robert Klemme wrote:
> Bucco wrote:
>> So, I'm reading this document on XML and using XML as a database. Of
>> course the author uses this cryptic perl script to parse the xml
>> file:
>>
>> #!/usr/bin/perl
>> use XML::LibXML;
>> my $parser = new XML::LibXML;
>> my $doc = $parser->parse_file( shift @ARGV );
>> my $balance = $doc->findvalue( '/checkbook/@balance-start' );
>> foreach my $record ( $doc->findnodes( '//debit' )) {
>> $balance -= $record->findvalue( 'amount' );
>> }
>> foreach my $record ( $doc->findnodes( '//deposit' )) {
>> $balance += $record->findvalue( 'amount' );
>> }
>> print "Current balance: $balance\n";
>>
>> So, since I was trying to figure out how to use xml as a database and
>> how to use REXML I gave the script a whack and tried to write a ruby
>> script to do the same thing. Below is a sample of the xml file and
>> the ruby script:
>>
>> <?xml version="1.0"?>
>> <checkbook balanceStart="2460.62">
>> <title>expenses: january 2002</title>
>>
>> <debit category="clothes">
>> <amount>31.19</amount>
>> <date><year>2002</year><month>1</month><day>3</day></date>
>> <payto>Walking Store</payto>
>> <description>shoes</description>
>> </debit>
>>
>> <deposit category="salary">
>> <amount>1549.58</amount>
>> <date><year>2002</year><month>1</month><day>7</day></date>
>> <payor>Bob's Bolts</payor>
>> </deposit>
>> </checkbook>
>>
>>
>> #!/usr/bin/ruby -w
>> require 'rexml/document'
>>
>> # Read in XML doc
>> doc = REXML::Document.new(File.open('cb.xml'))
>> # Future version need to have entry from command line
>>
>> # Find the balance and assign to float variabl 'balance'
>> balance = doc.root.attributes['balanceStart'].to_f
>>
>> # Calculate debits and balance
>> doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
>> # Calculate deposits and balance
>> doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}
>>
>> #Display final balance:
>> puts balance
>>
>> Of course I was able to complete the same task as teh perl script in
>> ruby with less code. (Not to mention easier to read code)
>>
>> Just to help me complete the learning process, I wish to pose the
>> question to the group: Is there a better way to do this, and is there
>> more optimization I can do to my code?
>
> You could get rid of one traversal by iterating all "amounts" and do
> the calculation based on the parent element's type.

If you want to speed up things even more you can do stream processing with
REXML's SAX like API:
http://www.germane-software.com/software/rexml/docs/tutorial.html...

Kind regards

robert