[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Database-like Algorithmic Problem

Christophe Mckeon

12/18/2007 3:55:00 PM

hi all,

ok here's the scoop. i have a collection of hashes all with an arbitrary
number of symbolic keys pointing to arbitrary objects. i have a template
hash also containing symbol keys pointing to 'matching objects', i.e.
objects to be matched with the objects in the aforementioned collection,
and i want to pull out the first match. an example:

# please don't psychoanalyze the below :)
collection = [
{ :foo => 'python', :bar => 'php', :x => 69 },
{ :foo => 'scheme', :mama => 'ruby', :papa => 'C' },
{ :x => 386 },
{ :mama => 'ruby', :papa => 'simula', :lil_bro => 'fortran' }
]

template1 = { :mama => 'ruby', :papa => 'C' }
template2 = { :x => Numeric }

match_first(template1, collection)
# should return: { :foo => 'scheme', :mama => 'ruby', :papa => 'C' }
match_all(template2, collection)
# should return: [{:x => 386 }, {:foo => 'python', :bar => 'php', :x =>
69}]

so obviously === is being used internally as is evidenced by Numeric in
template2, but what kind of datastructures/algorithms should i be
looking at for the most efficient (time-wise) implementation, keeping in
mind, there may be many hashes in the collection and those hashes are
generally not very large, i.e. contain few key/value pairs. i'm not too
savy on the taxonomy of algorithms so any pointers to what i should be
looking at to read up on would be appreciated.

my potentially naive solution is something like the following (in full
color http://pastie.caboo...):

require 'set'

@collection = {}

def add hash
hash.each do |key,value|
(@collection[key] ||= Set.new) << hash
end
end

def candidates template
sets = []
template.keys.each do |key|
sets << @collection[key] if @collection.include? key
end
if sets.empty?
return sets
else
return sets.inject(sets.pop) do |intersection, set|
intersection & set
end
end
end

def match template, hash
template.each do |key, match|
return false unless match === hash[key]
end
return true
end

def match_first template
candidates(template).each do |hash|
return hash if match(template, hash)
end
return nil
end

def match_all template
return candidates(template).select do |hash|
match(template, hash)
end
end

[
{ :foo => 'python', :bar => 'php', :x => 69 },
{ :foo => 'scheme', :mama => 'ruby', :papa => 'C' },
{ :x => 386, :papa => 'clause' },
{ :mama => 'ruby', :papa => 'simula', :lil_bro => 'fortran' },
{ :mama => 1946, :boo => 'far', :x => 'x'}
].each do |hash|
add hash
end

puts match_first({ :mama => 'ruby', :papa => 'C' }).inspect
puts match_all({:foo => String}).inspect

thanks for reading this far!
_c
--
Posted via http://www.ruby-....

8 Answers

Robert Klemme

12/18/2007 4:25:00 PM

0

2007/12/18, Christophe Mckeon <chromatophore@gmail.com>:
> ok here's the scoop. i have a collection of hashes all with an arbitrary
> number of symbolic keys pointing to arbitrary objects. i have a template
> hash also containing symbol keys pointing to 'matching objects', i.e.
> objects to be matched with the objects in the aforementioned collection,
> and i want to pull out the first match. an example:
>
> # please don't psychoanalyze the below :)

You are crazy. Please contact a doctor ASAP. Oh, sorry. :-)

> collection = [
> { :foo => 'python', :bar => 'php', :x => 69 },
> { :foo => 'scheme', :mama => 'ruby', :papa => 'C' },
> { :x => 386 },
> { :mama => 'ruby', :papa => 'simula', :lil_bro => 'fortran' }
> ]
>
> template1 = { :mama => 'ruby', :papa => 'C' }
> template2 = { :x => Numeric }
>
> match_first(template1, collection)
> # should return: { :foo => 'scheme', :mama => 'ruby', :papa => 'C' }
> match_all(template2, collection)
> # should return: [{:x => 386 }, {:foo => 'python', :bar => 'php', :x =>
> 69}]
>
> so obviously === is being used internally as is evidenced by Numeric in
> template2, but what kind of datastructures/algorithms should i be
> looking at for the most efficient (time-wise) implementation, keeping in
> mind, there may be many hashes in the collection and those hashes are
> generally not very large, i.e. contain few key/value pairs. i'm not too
> savy on the taxonomy of algorithms so any pointers to what i should be
> looking at to read up on would be appreciated.
>
> my potentially naive solution is something like the following (in full
> color http://pastie.caboo...):

<snip/>

> thanks for reading this far!

I think you are on a pretty good way. Here's what I would do differently:

1. Do no recreate all those candidate collections for every
match_first call. Instead, stuff everything into a class and provide a
method that will create an /index/ of the data (see 2).

2. I would use Hashes to store your indexes, i.e. instead of creating
candidates all over again, create them once (see 1) and keep them
there.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

Christophe Mckeon

12/18/2007 5:46:00 PM

0


> You are crazy. Please contact a doctor ASAP. Oh, sorry. :-)

:)

>
> I think you are on a pretty good way. Here's what I would do
> differently:
>
> ...

thanks robert. yeah i'd like to do some caching, but as hashes are
constantly
added and removed from the collection, i'll have to weigh up the
overhead of managing a cache, maybe even implement it and do some
profiling w/ and w/o. guess i'll have to read up on cache algs.

regards,
_c

--
Posted via http://www.ruby-....

Jay Levitt

12/18/2007 6:21:00 PM

0

On Tue, 18 Dec 2007 10:54:54 -0500, Christophe Mckeon wrote:

> # please don't psychoanalyze the below :)

Does it please you to believe that I please don't psychoanalyze the below?

--
Jay Levitt |
Boston, MA | My character doesn't like it when they
Faster: jay at jay dot fm | cry or shout or hit.
http://... | - Kristoffer

Dan Yoder

12/18/2007 7:30:00 PM

0

Interesting problem. Aside from seeing a doctor, my suggestions are in
the comments in the revised variant below:

---

require 'set'

# encapsulate
class DB

# use hash's default syntax
def initialize
@c = Hash.new{ | hash, key | hash[key] = Set.new }
end

# was: add, use << to follow ruby idiom for array, set
# store val with hash for quicker comparison
def <<(hash)
hash.each do |key,val|
@c[key] << [ val, hash ]
end
end

# need a delete operation ...
def delete(hash)
@c.delete(hash)
end

# was: match_all, use select to follow ruby enumerable idiom
# determine candidates on the fly, avoid intersection code using Set
def select(template)
good = Set.new; bad = Set.new
template.each do |key,val1|
(@c[key]).each do |val2,hash|
( val1 === val2 ? good : bad ) << hash unless bad.include?
hash
end
end
return good.to_a
end

# was: match_first, use select to follow ruby enumerable idiom
# determine candidates on the fly, avoid intersection code using Set
def find(template)
good = Set.new; bad = Set.new
template.each do |key,val1|
(@c[key]).each do |val2,hash|
( val1 === val2 ? good : bad ) << hash unless bad.include?
hash
end
return good.to_a.first unless good.empty?
end
return nil
end
end


---

HTH!

Dan Yoder
http://dev.ze...

On Dec 18, 9:45 am, Christophe Mckeon <chromatoph...@gmail.com> wrote:
> > You are crazy. Please contact a doctor ASAP. Oh, sorry. :-)
>
> :)
>
>
>
> > I think you are on a pretty good way. Here's what I would do
> > differently:
>
> > ...
>
> thanks robert. yeah i'd like to do some caching, but as hashes are
> constantly
> added and removed from the collection, i'll have to weigh up the
> overhead of managing a cache, maybe even implement it and do some
> profiling w/ and w/o. guess i'll have to read up on cache algs.
>
> regards,
> _c
>
> --
> Posted viahttp://www.ruby-....

Christophe Mckeon

12/19/2007 9:24:00 PM

0

cruiserdan wrote:

> class DB
> ...
> end

thanks for that, i benchmarked it and it is many times faster. now i
have to see how i'm going to do caching etc..

_c
--
Posted via http://www.ruby-....

Christophe Mckeon

12/19/2007 9:29:00 PM

0

Christophe Mckeon wrote:
> cruiserdan wrote:
>
>> class DB
>> ...
>> end
>
> thanks for that, i benchmarked it and it is many times faster. now i
> have to see how i'm going to do caching etc..
>
> _c

oops, i read my benchmarks wrong, it is actually not *that* much faster

yours: 9.020000 0.940000 9.960000 ( 11.283543)
mine: 9.220000 1.380000 10.600000 ( 11.127394)

but is an improvement, so thanks. i'll do some more comprehensive
benchmarks
and post them here in a bit.

--
Posted via http://www.ruby-....

Christophe Mckeon

12/19/2007 10:23:00 PM

0

there's a bug in yours. not sure where yet. try

d = DB.new
d << {:a => 2, :b => 1, :c => 'b'}
puts d.find({ :a => Numeric, :b => 'a' }).inspect

returns {:a => 2, :b => 1, :c => 'b'}

--
Posted via http://www.ruby-....

Dan Yoder

12/19/2007 10:59:00 PM

0

Okay, here's a fixed version.

There were 2 problems. The first was that I forgot to remove the bad
(unmatched) hashes before returning a result. The second was that you
have to go through every key in the template to find a match (I was
returning after the first match). Thus find is no more efficient than
select due to the way the data is stored.

I also took out the 'unless bad.include? hash' clause from the compare
because I think it actually is slower that way, depending on what you
are comparing. I thought it would be faster than the original version
posted because I don't have to generate a set of candidates first, so
the initial benchmarks were disappointing. But maybe this version will
do even better. I look forward to seeing your results.

---

# new version of select
def select(template)
good = Set.new; bad = Set.new
template.each do |key,val1|
(@c[key]).each do |val2,hash|
( val1 === val2 ? good : bad ) << hash # unless bad.include?
hash
end
end
return (good - bad).to_a
end

# new version of find
def find(template)
r = select template
r.first unless r.empty?
end


---

Regards,
Dan


On Dec 19, 2:23 pm, Christophe Mckeon <chromatoph...@gmail.com> wrote:
> there's a bug in yours. not sure where yet. try
>
> d = DB.new
> d << {:a => 2, :b => 1, :c => 'b'}
> puts d.find({ :a => Numeric, :b => 'a' }).inspect
>
> returns {:a => 2, :b => 1, :c => 'b'}
>
> --
> Posted viahttp://www.ruby-....