Asp Forum - Indexed arrays, delete_if, and performance

Jason Leong

11/1/2008 6:12:00 AM

Dear all,

My application uses a calendar which I'm populating using an array of
indexed RollingEvent objects. A RollingEvent is one of a series,
multiplied from its original Event's repeat options (daily, weekly,
fortnightly etc). The idea is that when a user saves an Event, the app
uses a loop to generate and store all the RollingEvents for the coming
year.

So a new Event is added, add_rolling_event is called within the loop
(I've omitted most of the params for readability):

def add_rolling_event(id, title, calendar_date)
@events[calendar_date.to_date] ||= []
@events[calendar_date.to_date] << RollingEvent.new(id, title,
calendar_date)
end

The indexing works well, as in the calendar view I can quickly pull out
the RollingEvent objects for that specific day. For the moment, whenever
an Event is edited the entire @events array is reset and a routine loops
through all Events to re-generate RollingEvents.

I'm trying to speed up the process by just deleting the RollingEvents
pertinent to the Event that's being edited, then calling
add_rolling_event for the Event in question.

I have 3 questions:

1. Is using an indexed array actually faster than picking out events on
the day using @events.select { |e| e.calendar_date == date }? It would
be good to uncomplicate this if possible.

2. It seems that if I were to use delete_if on an indexed array, I'd
have to loop through the array of RollingEvent objects for each indexed
day. Am I right in assuming that this is uber-inefficient? Is there a
better way to do this?

3. Currently @events (in a model called Event_Roster) and RollingEvents
are all non-database-backed for performance, and stored in a memcached
session. My concern has always been that - hypothetically - one could
have 10 daily Events, resulting in 3650 RollingEvents. This doesn't seem
to be a scalable solution, and the read/writes to and from the database
would be costly. Would you advise writing all RollingEvents to a
database, and is the performance hit negligable?

Thanks!
--
Posted via http://www.ruby-....

5 Answers

Robert Klemme

11/1/2008 10:30:00 AM

On 01.11.2008 07:11, Jason Leong wrote:
> Dear all,
>
> My application uses a calendar which I'm populating using an array of
> indexed RollingEvent objects. A RollingEvent is one of a series,
> multiplied from its original Event's repeat options (daily, weekly,
> fortnightly etc). The idea is that when a user saves an Event, the app
> uses a loop to generate and store all the RollingEvents for the coming
> year.
>
> So a new Event is added, add_rolling_event is called within the loop
> (I've omitted most of the params for readability):
>
> def add_rolling_event(id, title, calendar_date)
> @events[calendar_date.to_date] ||= []
> @events[calendar_date.to_date] << RollingEvent.new(id, title,
> calendar_date)
> end

You can make that a tad more efficient by doing

def add_rolling_event(id, title, calendar_date)
(@events[calendar_date.to_date] ||= []) <<
RollingEvent.new(id, title, calendar_date)
end

> The indexing works well, as in the calendar view I can quickly pull out
> the RollingEvent objects for that specific day. For the moment, whenever
> an Event is edited the entire @events array is reset and a routine loops
> through all Events to re-generate RollingEvents.
>
> I'm trying to speed up the process by just deleting the RollingEvents
> pertinent to the Event that's being edited, then calling
> add_rolling_event for the Event in question.
>
> I have 3 questions:
>
> 1. Is using an indexed array actually faster than picking out events on
> the day using @events.select { |e| e.calendar_date == date }? It would
> be good to uncomplicate this if possible.

What do you mean by "indexed array"? I am not sure what your #to_date
returns so you might be using a Hash or an Array. Ultimately when it
comes to "faster" you need to implement different variants and benchmark
them. Fortunately this is pretty easily done in Ruby.

> 2. It seems that if I were to use delete_if on an indexed array, I'd
> have to loop through the array of RollingEvent objects for each indexed
> day. Am I right in assuming that this is uber-inefficient? Is there a
> better way to do this?

Well, it seems you have to access paths to an event: lookup by date and
lookup by event id. I do not know whether there are other access paths
(e.g. search for event name or such) but assuming for the moment that
there are just the two a solution with two Hashes seems most appropriate

class Calendar
def initialize
@ev_by_date = Hash.new {|h,k| h[k] = []}
@ev_by_id = Hash.new {|h,k| h[k] = []}
end

def add_rolling_event(id, title, calendar_date)
ev = RollingEvent.new(id, title, calendar_date)
@ev_by_date[calendar_date] = ev
@ev_by_id[id] = ev
end

def delete_by_id(id)
@ev_by_id.delete(id).each do |ev|
@ev_by_date[ev.date].delete(ev)
end
end

def delete_by_date(date)
@ev_by_date.delete(date).each do |ev|
@ev_by_id[ev.id].delete(ev)
end
end
end

> 3. Currently @events (in a model called Event_Roster) and RollingEvents
> are all non-database-backed for performance, and stored in a memcached
> session. My concern has always been that - hypothetically - one could
> have 10 daily Events, resulting in 3650 RollingEvents. This doesn't seem
> to be a scalable solution, and the read/writes to and from the database
> would be costly. Would you advise writing all RollingEvents to a
> database, and is the performance hit negligable?

There are a lot other factors that you need to consider when thinking
about the database. 3600 events is certainly a small number, even when
read from a flat file. You rather want a DB if your application needs
to run on multiple machines concurrently and you want to have a single
repository etc. As long as you can foresee that there will not be
millions of entries I would probably not complicate the application by
adding another component.

Kind regards

robert

Jason Leong

11/1/2008 8:21:00 PM

> You can make that a tad more efficient by doing
>
> def add_rolling_event(id, title, calendar_date)
> (@events[calendar_date.to_date] ||= []) <<
> RollingEvent.new(id, title, calendar_date)
> end
>

Gotcha :-)

>
> What do you mean by "indexed array"? I am not sure what your #to_date
> returns so you might be using a Hash or an Array. Ultimately when it
> comes to "faster" you need to implement different variants and benchmark
> them. Fortunately this is pretty easily done in Ruby.

Sorry, my terminology needs work - #to_date returns a date value so I'm
using a Hash. I'll certainly benchmark the variants.

> Well, it seems you have to access paths to an event: lookup by date and
> lookup by event id. I do not know whether there are other access paths
> (e.g. search for event name or such) but assuming for the moment that
> there are just the two a solution with two Hashes seems most appropriate
>

That's correct! And by highlighting the access paths you've helped me
narrow it down to one of the challenges I've come across. I have the
event id which I could use to delete an event from the array indexed by
event hash - but this doesn't delete the event indexed by id hash (or
does it? I'll give it a go).

> There are a lot other factors that you need to consider when thinking
> about the database. 3600 events is certainly a small number, even when
> read from a flat file. You rather want a DB if your application needs
> to run on multiple machines concurrently and you want to have a single
> repository etc. As long as you can foresee that there will not be
> millions of entries I would probably not complicate the application by
> adding another component.

That's good advice. Thanks Robert!

--
Posted via http://www.ruby-....

Robert Klemme

11/1/2008 10:06:00 PM

On 01.11.2008 21:21, Jason Leong wrote:

>> Well, it seems you have to access paths to an event: lookup by date and
>> lookup by event id. I do not know whether there are other access paths
>> (e.g. search for event name or such) but assuming for the moment that
>> there are just the two a solution with two Hashes seems most appropriate
>
> That's correct! And by highlighting the access paths you've helped me
> narrow it down to one of the challenges I've come across. I have the
> event id which I could use to delete an event from the array indexed by
> event hash - but this doesn't delete the event indexed by id hash (or
> does it? I'll give it a go).

Please look carefully at the code. Although untested it should do
exactly this: delete from the other Hash. Note that the return value of
Hash#delete is the deleted element. I probably should have added a
safety check to ensure nil does not cause errors. So you'd probably
rather do

def delete_by_id(id)
dlt = @ev_by_id.delete(id) and dlt.each do |ev|
@ev_by_date[ev.date].delete(ev)
end
end

Kind regards

robert

Jason Leong

11/3/2008

> Please look carefully at the code. Although untested it should do
> exactly this: delete from the other Hash. Note that the return value of
> Hash#delete is the deleted element. I probably should have added a
> safety check to ensure nil does not cause errors. So you'd probably
> rather do
>
> def delete_by_id(id)
> dlt = @ev_by_id.delete(id) and dlt.each do |ev|
> @ev_by_date[ev.date].delete(ev)
> end
> end

Ah yes! Pounded off a reply before I looked, sorry - thanks for the
safety check too, that certainly came in handy. The one difference in my
final implementation is this:

def delete_events_by_id(id)
dlt = @titles.delete(id.to_i) and dlt.each do |e|
@events[e.date].delete_if { |i| i.id == e.id }
end
end

The reason being (I think!) the object passed in for deletion in
@ev_by_date[ev.date].delete(ev) does not match up with the object in
@ev_by_date as they're two different Hashes - but a compare based on the
id works. Does that sound right?

Thanks Robert, you've taught me a bunch about Hashes!
--
Posted via http://www.ruby-....

Robert Klemme

11/3/2008 8:12:00 AM

2008/11/3 Jason Leong <jason@pocketsmith.com>:
> > Please look carefully at the code. Although untested it should do
>> exactly this: delete from the other Hash. Note that the return value of
>> Hash#delete is the deleted element. I probably should have added a
>> safety check to ensure nil does not cause errors. So you'd probably
>> rather do
>>
>> def delete_by_id(id)
>> dlt = @ev_by_id.delete(id) and dlt.each do |ev|
>> @ev_by_date[ev.date].delete(ev)
>> end
>> end
>
> Ah yes! Pounded off a reply before I looked, sorry - thanks for the
> safety check too, that certainly came in handy.

And it's needed if someone passes in a wrong id or date ("wrong"
meaning, there is no data for it).

> The one difference in my
> final implementation is this:
>
> def delete_events_by_id(id)
> dlt = @titles.delete(id.to_i) and dlt.each do |e|
> @events[e.date].delete_if { |i| i.id == e.id }
> end
> end
>
> The reason being (I think!) the object passed in for deletion in
> @ev_by_date[ev.date].delete(ev) does not match up with the object in
> @ev_by_date as they're two different Hashes - but a compare based on the
> id works. Does that sound right?

I do not know the rest of your code but in my version the same
instance was put into both Hashes. If you think about it, this is
what you want, i.e. regardless of whether you look up the event by id
or date you want to have the same instance - otherwise you would have
to manipulate two instances all the time, which makes code more
complex and also wastes memory. And in that case Array#delete is
sufficient:

irb(main):001:0> o = Object.new
=> #<Object:0x7ff9c69c>
irb(main):002:0> a = [o]
=> [#<Object:0x7ff9c69c>]
irb(main):003:0> a.delete o
=> #<Object:0x7ff9c69c>
irb(main):004:0> a
=> []

Even for multiple occurrences:

irb(main):005:0> a = [o,o]
=> [#<Object:0x7ff9c69c>, #<Object:0x7ff9c69c>]
irb(main):006:0> a.delete o
=> #<Object:0x7ff9c69c>
irb(main):007:0> a
=> []

But delete_if is ok of course as well. If you need more efficiency you
can use Set instead of Array as Hash values but then you should use
#delete because this is more efficient for Set (O(1) vs. O(n) for
delete_if).

> Thanks Robert, you've taught me a bunch about Hashes!

You're welcome!

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end

comp.lang.ruby

Indexed arrays, delete_if, and performance

Jason Leong

Robert Klemme

Jason Leong

Robert Klemme

Jason Leong

Robert Klemme

x Login to ForumsZone