Asp Forum - REXML document creation speed

tedmilkey

2/19/2008 5:30:00 AM

Hi,

Please excuse my ignorance, but I'm new to this.

I've written a script that downloads historical stock quotes as .csv,
parses it, then writes out an XML doc that I can then use elsewhere.

It works as designed, but it's *dog slow* and I don't see why. It can
take over 5 minutes to run, and nearly all of that time is writing the
XML docs (I've tested it running the script with the XML document
creation lines commented out and it takes only seconds).

The script is below:

require 'rubygems'
require 'net/http'
require 'FasterCSV'
require 'rexml/document'
include REXML
puts "Start #{Time.now()}"
symbols = Array.new
xml_symbols_doc = Document.new(File.new("symbols.xml"))
for i in 1..xml_symbols_doc.root.elements.size
symbols[i] = xml_symbols_doc.root.elements[i].get_text.value
end
for i in 1..(symbols.length - 1)
quote_source_url = "http://ichart.finance.yahoo.com/...
s=#{URI.encode(symbols[i])}"
quote_response =
Net::HTTP.get_response(URI.parse(quote_source_url))
csv_quotes = FasterCSV.parse(quote_response.body, {:headers =>
true, :header_converters => :symbol})
xml_quotes = Document.new
xml_quotes << XMLDecl.new
xml_quotes.add_element("quotes", {"symbol" => "#{symbols[i]}"})
for j in 0..(csv_quotes.length - 1)
quote = Element.new("quote")
for k in 0..(csv_quotes.headers.length - 1)
quote.add_element("#{csv_quotes.headers()[k]}").text =
"#{csv_quotes[j][k]}"
end
xml_quotes.root << quote
end
xml_quotes_output_file = File.new("#{symbols[i]}.xml", "w+")
xml_quotes.write(xml_quotes_output_file, 3)
puts "#{symbols[i]} OK. File here: #{xml_quotes_output_file.path}"
end
puts "End #{Time.now()}"

Any suggestions as to how I can make this script run (much) faster are
greatly appreciated!

Thanks for your help!
Ted

2 Answers

Robert Klemme

2/19/2008 6:51:00 AM

On 19.02.2008 06:29, tedmilkey@gmail.com wrote:
> Hi,
>
> Please excuse my ignorance, but I'm new to this.
>
> I've written a script that downloads historical stock quotes as .csv,
> parses it, then writes out an XML doc that I can then use elsewhere.
>
> It works as designed, but it's *dog slow* and I don't see why. It can
> take over 5 minutes to run, and nearly all of that time is writing the
> XML docs (I've tested it running the script with the XML document
> creation lines commented out and it takes only seconds).

REXML is not particularly fast but what makes you sure that it's in
REXML an not in the way you prepare the data? Did you test with "-r
profile"? Did you notice that you have three nested levels of loops -
that may well be the source of the slowness.

A few stylistic remarks: you should use the block form of File.open in
order to ensure proper and timely cleanup.

You can make your live easier by using Ruby's iterating idioms and not
for with array indexes.

Note also that there's XPath expressions that you can use for iterating
an XML document.

> The script is below:

<snip/>

Kind regards

robert

Dejan Dimic

2/19/2008 7:58:00 AM

On Feb 19, 7:51 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 19.02.2008 06:29, tedmil...@gmail.com wrote:
>
> > Hi,
>
> > Please excuse my ignorance, but I'm new to this.
>
> > I've written a script that downloads historical stock quotes as .csv,
> > parses it, then writes out an XML doc that I can then use elsewhere.
>
> > It works as designed, but it's *dog slow* and I don't see why. It can
> > take over 5 minutes to run, and nearly all of that time is writing the
> > XML docs (I've tested it running the script with the XML document
> > creation lines commented out and it takes only seconds).
>
> REXML is not particularly fast but what makes you sure that it's in
> REXML an not in the way you prepare the data? Did you test with "-r
> profile"? Did you notice that you have three nested levels of loops -
> that may well be the source of the slowness.
>
> A few stylistic remarks: you should use the block form of File.open in
> order to ensure proper and timely cleanup.
>
> You can make your live easier by using Ruby's iterating idioms and not
> for with array indexes.
>
> Note also that there's XPath expressions that you can use for iterating
> an XML document.
>
> > The script is below:
>
> <snip/>
>
> Kind regards
>
> robert

From my personal experience the Hpricot was much faster then REXML.
As Robert already mentioned iterate trough collections.
The first thing you should do is to add some metrics to find what the
slowest part of your program is. Without measurement you can not
determine if you are on the right track of improvement. Not just from
start to finish but add some check points.
As you have a list of symbols to download you should make multi
threaded approach to it.

You guess it right - there is a lot of space to improve this
application but measure the performance first than act and measure the
improvement

comp.lang.ruby

REXML document creation speed

tedmilkey

Robert Klemme

Dejan Dimic

x Login to ForumsZone