[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Searching through a sorted array

FireAphis@gmail.com

10/3/2007 7:11:00 AM

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range. I could use find_all

partition = my_array.find_all {|x| (x > start) && (x < end)}

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

Thanks
FireAphis

14 Answers

Peter Szinek

10/3/2007 7:30:00 AM

0

FireAphis wrote:
> Hello,
>
> I have a very big array of objects sorted by one of its numeric data
> members. During the flow of my application I need occasionally to get
> all the elements in a specific range.

Then why not

my_array[start..end] ?

Cheers,
Peter
__
http://www.rubyra...
http://s...

FireAphis@gmail.com

10/3/2007 7:48:00 AM

0

On Oct 3, 9:30 am, Peter Szinek <pe...@rubyrailways.com> wrote:
> FireAphis wrote:
> > Hello,
>
> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.
>
> Then why not
>
> my_array[start..end] ?
>
> Cheers,
> Peter
> __http://www.rubyrailways.comhttp://s...

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

Axel Etzold

10/3/2007 7:57:00 AM

0


-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 16:30:03 +0900
> Von: Peter Szinek <peter@rubyrailways.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: Searching through a sorted array

> FireAphis wrote:
> > Hello,
> >
> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.
>
> Then why not
>
> my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/s...

Axel Etzold

10/3/2007 8:00:00 AM

0


-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 16:50:03 +0900
> Von: FireAphis <FireAphis@gmail.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: Searching through a sorted array

> On Oct 3, 9:30 am, Peter Szinek <pe...@rubyrailways.com> wrote:
> > FireAphis wrote:
> > > Hello,
> >
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to get
> > > all the elements in a specific range.
> >
> > Then why not
> >
> > my_array[start..end] ?
> >
> > Cheers,
> > Peter
> > __http://www.rubyrailways.comhttp://s...
>
> Sorry if my explanation has been misleading. The array isn't
> necessarily comprised of continuous or consecutive numbers. Moreover
> they don't necessarily start from zero. For example:
>
> my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]
>
> now I want to get all the elements bigger than 1 and smaller than
> 1000. AFAIK my_array[1..1000] doesn't help in this case.
>
In that case, you can say:

interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/s...

FireAphis@gmail.com

10/3/2007 8:14:00 AM

0

On Oct 3, 9:56 am, "Axel Etzold" <AEtz...@gmx.de> wrote:
> -------- Original-Nachricht --------
>
> > Datum: Wed, 3 Oct 2007 16:30:03 +0900
> > Von: Peter Szinek <pe...@rubyrailways.com>
> > An: ruby-t...@ruby-lang.org
> > Betreff: Re: Searching through a sorted array
> > FireAphis wrote:
> > > Hello,
>
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to get
> > > all the elements in a specific range.
>
> > Then why not
>
> > my_array[start..end] ?
>
> Dear FireAphis,
>
> to find the values 'start' and 'end', you could use something like
> the twenty questions game to find a number between 1 and a million
> (roughly 2^20, hence 20 questions), starting by :
>
> 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?
>
> 2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
> If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?
>
> etc..
>
> The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
> values.
>
> I am actually not aware whether that's implemented in the Array#index
> method.
>
> Best regards,
>
> Axel
>
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal f?r Modem und ISDN:http://www.gmx.net/de/go/s...

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

Axel Etzold

10/3/2007 9:15:00 AM

0


-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 17:15:12 +0900
> Von: FireAphis <FireAphis@gmail.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: Searching through a sorted array

> On Oct 3, 9:56 am, "Axel Etzold" <AEtz...@gmx.de> wrote:
> > -------- Original-Nachricht --------
> >
> > > Datum: Wed, 3 Oct 2007 16:30:03 +0900
> > > Von: Peter Szinek <pe...@rubyrailways.com>
> > > An: ruby-t...@ruby-lang.org
> > > Betreff: Re: Searching through a sorted array
> > > FireAphis wrote:
> > > > Hello,
> >
> > > > I have a very big array of objects sorted by one of its numeric data
> > > > members. During the flow of my application I need occasionally to
> get
> > > > all the elements in a specific range.
> >
> > > Then why not
> >
> > > my_array[start..end] ?
> >
> > Dear FireAphis,
> >
> > to find the values 'start' and 'end', you could use something like
> > the twenty questions game to find a number between 1 and a million
> > (roughly 2^20, hence 20 questions), starting by :
> >
> > 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly
> 500000)?
> >
> > 2.) If the answer was "yes", is the index of 'start' ('end') bigger or
> smaller than 2^19+2^18 (roughly 750000)?
> > If the answer was "no", is the index of 'start' ('end') bigger or
> smaller than 2^18 (roughly 250000)?
> >
> > etc..
> >
> > The range in which "start" lies reduces its size by half with each
> question, and you need 2*log(2) n questions to find the 'start' and 'end'
> > values.
> >
> > I am actually not aware whether that's implemented in the Array#index
> > method.
> >
> > Best regards,
> >
> > Axel
> >
> > --
> > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> > Ideal für Modem und ISDN:http://www.gmx.net/de/go/s...
>
> Of course that could be an excellent solution. Do you know if there is
> already such a facility in Ruby that implements the algorithm? I'm
> quite sure Array#index doesn't do that since in order to apply a
> binary search the array has to be sorted.
>

Dear FireAphis,

I don't know of an implemented solution, but I've implemented my own ... use at your own risk!

Best regards,

Axel


class Array
def find_lower_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
lower_index=0
pow=1
while eval(cond)==false
lower_index=2**pow
pow=pow+1
end
pow=pow-2
lower=2**pow
(pow-1).downto(0) do |x|
old_test=lower_index
lower_index=lower+2**x
if eval(cond)==false
else
lower_index=old_test
end
end
return lower_index+1
end

def find_upper_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
upper_index=0
pow=1
while eval(cond)==true
upper_index=2**pow
pow=pow+1
end
pow=pow-2
upper=2**pow
(pow-1).downto(0) do |x|
old_test=upper_index
upper_index=upper+2**x
if eval(cond)
else
upper_index=old_test
end
upper=upper_index
end
return upper_index-1
end


end

# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/mult...

7stud 7stud

10/3/2007 11:46:00 AM

0

FireAphis wrote:
> My problem is that, as far as I know, find_all just iterates through
> all the elements of the array and that's very inefficient, especially
> in my case, in which I have a sorted array. Is there any standard way
> to search efficiently through a sorted array? A binary search for
> example?
>

You can also turn your array into a hash, which will work just as well
for unsorted arrays:


class Dog
attr_reader :breed, :id

def initialize(breed, id)
@breed = breed
@id = id
end

end

#Create sorted array of Dog's:
#-----------------------------
dogs = []
(1..2_000).each do |r|
dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject([]) do |arr, r|
arr << Dog.new("Akita", r*2)
arr
end
=end
#----------------------------+


#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
attr_accessor :highest_id
end

dogs.each do |dog|
my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+


#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

high_id = id_dog_hash.highest_id
if range.end > high_id
range = range.begin..high_id
end


dogs_in_range = []

range.each do |r|
dog = id_dog_hash[r]

if dog
dogs_in_range << id_dog_hash[r]
end

end
=begin
dogs_in_range = range.inject([]) do |arr1, r|
if dog = id_dog_hash[r]
arr1 << dog
end

arr1
end
=end

dogs_in_range
end
------------------+


#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">



--
Posted via http://www.ruby-....

James Gray

10/3/2007 12:15:00 PM

0

On Oct 3, 2007, at 3:15 AM, FireAphis wrote:

> Of course that could be an excellent solution. Do you know if there is
> already such a facility in Ruby that implements the algorithm?

A recent Ruby Quiz centered around binary search algorithms, so you
can find several examples reading those solutions:

http://www.rubyquiz.com/qu...

James Edward Gray II


FireAphis@gmail.com

10/3/2007 12:15:00 PM

0

On Oct 3, 1:45 pm, 7stud -- <dol...@excite.com> wrote:
> FireAphis wrote:
> > My problem is that, as far as I know, find_all just iterates through
> > all the elements of the array and that's very inefficient, especially
> > in my case, in which I have a sorted array. Is there any standard way
> > to search efficiently through a sorted array? A binary search for
> > example?
>
> You can also turn your array into a hash, which will work just as well
> for unsorted arrays:
>
> class Dog
> attr_reader :breed, :id
>
> def initialize(breed, id)
> @breed = breed
> @id = id
> end
>
> end
>
> #Create sorted array of Dog's:
> #-----------------------------
> dogs = []
> (1..2_000).each do |r|
> dogs << Dog.new("Akita", r*2)
> end
>
> =begin
> dogs = (1..2000).inject([]) do |arr, r|
> arr << Dog.new("Akita", r*2)
> arr
> end
> =end
> #----------------------------+
>
> #Convert array to hash.
> #key: dog.id
> #val: Dog object
> #---------------------
> my_id_dog_hash = {}
>
> class << my_id_dog_hash
> attr_accessor :highest_id
> end
>
> dogs.each do |dog|
> my_id_dog_hash[dog.id] = dog
> end
>
> my_id_dog_hash.highest_id = dogs[-1].id
> ----------------------+
>
> #Find Dog's in range:
> #-------------------
> def get_dogs_in_range(id_dog_hash, range)
>
> high_id = id_dog_hash.highest_id
> if range.end > high_id
> range = range.begin..high_id
> end
>
> dogs_in_range = []
>
> range.each do |r|
> dog = id_dog_hash[r]
>
> if dog
> dogs_in_range << id_dog_hash[r]
> end
>
> end
> =begin
> dogs_in_range = range.inject([]) do |arr1, r|
> if dog = id_dog_hash[r]
> arr1 << dog
> end
>
> arr1
> end
> =end
>
> dogs_in_range
> end
> ------------------+
>
> #Test it out:
> ------------
> results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
> results.each {|dog| p dog}
> -----------+
>
> --output:--
> #<Dog:0x1a70c @id=1990, @breed="Akita">
> #<Dog:0x1a6e4 @id=1992, @breed="Akita">
> #<Dog:0x1a6bc @id=1994, @breed="Akita">
> #<Dog:0x1a694 @id=1996, @breed="Akita">
> #<Dog:0x1a66c @id=1998, @breed="Akita">
> #<Dog:0x1a644 @id=2000, @breed="Akita">
>
> --
> Posted viahttp://www.ruby-....

Neat. Not space efficient but undoubtfully more quick. But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration? Consider the fact that
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

Thanks,
FireAphis

Jesús Gabriel y Galán

10/3/2007 12:35:00 PM

0

On 10/3/07, FireAphis <FireAphis@gmail.com> wrote:
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to get
> > > all the elements in a specific range.
> Sorry if my explanation has been misleading. The array isn't
> necessarily comprised of continuous or consecutive numbers. Moreover
> they don't necessarily start from zero. For example:
>
> my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]
>
> now I want to get all the elements bigger than 1 and smaller than
> 1000. AFAIK my_array[1..1000] doesn't help in this case.

Here is my try:

class Array
# assumes a sorted array whose elements respond appropriately to <=>
def subrange(lower, higher)
low = find_index_lower(lower)
high = find_index_higher(higher)
if (low >= length) || (high < 0)
[]
else
self[low..high]
end

end

# Finds the lowest index whose value is greater than the specified parameter
# If all values in the array are lower, returns the index after the last one
# If all values are higher, returns 0
def find_index_lower(lower)
return length if ((last <=> lower) == -1)
return 0 if ((first <=> lower) == 1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> lower
case test
when 0:
return index + 1
when -1:
return (index + 1) if ((self[index+1] <=> lower) == 1)
low = index + 1
when 1:
return (index) if ((self[index-1] <=> lower) <= 0)
high = index - 1
end
end
end

# Finds the highest index whose value is lower than the specified parameter
# If all values in the array are higher, returns -1
# If all values are lower, returns length-1
def find_index_higher(higher)
return -1 if ((first <=> higher) >= 0)
return length - 1 if ((last <=> higher) == -1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> higher
case test
when 0:
return index - 1
when -1:
return (index) if ((self[index+1] <=> higher) >= 0)
low = index + 1
when 1:
return (index - 1) if ((self[index-1] <=> higher) == -1)
high = index - 1
end
end
end


end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts "6 - 13: #{a.subrange(6,13).inspect}"
puts "5 - 15: #{a.subrange(5,15).inspect}"
puts "5 - 100: #{a.subrange(5, 100).inspect}"
puts "2 - 100: #{a.subrange(2, 100).inspect}"
puts "-1 - 100: #{a.subrange(-1, 100).inspect}"
puts "100 - 150: #{a.subrange(100,150).inspect}"
puts "1 - 2: #{a.subrange(1,2).inspect}"

require 'benchmark'

n = 10_000
Benchmark.bmbm do |x|
x.report("Array subrange") {
n.times {a.subrange(6,13)}
}
end

-----------------------------------------------------------

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150: []
1 - 2: []
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

user system total real
Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.