Asp Forum - Re: optimizing for speed - Array#each

Stefan Rusterholz

5/23/2007 10:15:00 PM

Mike Steiner wrote:
> I ran the profiler on my program and it said that 51% of the time is
> spent
> in Array#each. Is there any way to speed this up (like using
> Array#each_index perhaps)?
>
> And is there a quicker way to do a .gsub if I'm using just strings and
> not a
> regexp?
>
> By the way, the program compares 2 databases and determines which
> records
> have been inserted and deleted, and each database has about 15,000
> records
> to compare.
>
> Mike Steiner

No. I don't think you can speed up Array#each at all. What's far more
promising is improving your algorithm in a way that it has to run less
loops. You most likely have one or even several O(n^x) algorithms with x
>= 2 in your app. Maybe you're not even aware. E.g. if you have an each
with a select in the loop you're already in O(n^2) realm.
The database comparison probably can be handled best with hashtables of
the primary keys.

Regards
Stefan

--
Posted via http://www.ruby-....

4 Answers

Tim Pease

5/23/2007 10:31:00 PM

On 5/23/07, Stefan Rusterholz <apeiros@gmx.net> wrote:
>
> No. I don't think you can speed up Array#each at all. What's far more
> promising is improving your algorithm in a way that it has to run less
> loops. You most likely have one or even several O(n^x) algorithms with x
> >= 2 in your app. Maybe you're not even aware. E.g. if you have an each
> with a select in the loop you're already in O(n^2) realm.
> The database comparison probably can be handled best with hashtables of
> the primary keys.
>

I was going to suggest something even more primitive -- query each
database table in some sorted fashion and dump it as a CSV file. Use
diff to compare CSV files. Obviously this won't work for tables with
blobs, but it might be a fun experiment.

Failing that, how can the database do the work for you? Let it find
the records you're interested in (it will do it faster). Put
modification timestamps in your tables to reduce the number of records
you need to search. Or use triggers to log all transactions to a
separate table so you can run those same transactions on the other
database.

Just some thoughts.

Blessings,
TwP

Robert Klemme

5/24/2007 8:37:00 AM

On 24.05.2007 00:15, Stefan Rusterholz wrote:
> Mike Steiner wrote:
>> I ran the profiler on my program and it said that 51% of the time is
>> spent
>> in Array#each. Is there any way to speed this up (like using
>> Array#each_index perhaps)?

Note that the profiler output is often irritating because time for each
also includes all the time spend in the block - and while #each is
invoked only once the block is invoked n times.

>> And is there a quicker way to do a .gsub if I'm using just strings and
>> not a regexp?

Quicker than what? Are you using the block form of gsub? Are you using
gsub! or gsub? Can you show some code?

>> By the way, the program compares 2 databases and determines which
>> records
>> have been inserted and deleted, and each database has about 15,000
>> records
>> to compare.

15,000 does not sound like much. What is the runtime of the program
without profile?

And another idea: could be that you are using inefficient methods to
test for existence. Finding something in an Array is always O(n) unless
you have additional constraints; i.e. array content is sorted in which
case you can use binary search which is O(log(n)). It seems you want to
be doing set operations - so why don't you try to use Set for which
member tests are far more efficient (O(1) because it's using hashing
internally).

Kind regards

robert

Stefan Mahlitz

5/24/2007 8:23:00 PM

Mike Steiner wrote:
> One of the things I'm doing is a lot of File.join's, and I have to do a
> gsub
> ( /\// , "\\" ) after each one (because I'm running under Windows).

Are you using the strings generated by File.join outside of ruby?

If not you can simply use them without gsub.

Stefan

Robert Klemme

5/24/2007 9:02:00 PM

2007/5/24, Mike Steiner <mikejaysteiner@gmail.com>:
> One of the things I'm doing is a lot of File.join's, and I have to do a gsub
> ( /\// , "\\" ) after each one (because I'm running under Windows).
>
> I can't use sets because I have duplicate records. There is no unique record
> key, so for each record (which I use as a key in a hash), I have an array
> listing the files it appears in (which can include the same file more than
> once).

Can you be a bit more specific on what exactly you are doing?

robert

--
Have a look: http://www.flickr.com/photos/fu...

comp.lang.ruby

Re: optimizing for speed - Array#each

Stefan Rusterholz

Tim Pease

Robert Klemme

Stefan Mahlitz

Robert Klemme

x Login to ForumsZone