[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Duplicate elements in array

Shuaib Zahda

10/28/2007 12:48:00 PM

Hello

I am trying to output the duplicate elements in an array. I looked into
the api of ruby I found uniq method which outputs the array with no
duplication. What i want is to know which elements is duplicated.
For example

array = ["apple", "banana", "apple", "orange"]
=> ["apple", "banana", "apple", "orange"]
array.uniq
=> ["apple", "banana", "orange"]

I want the method to tell me that apple is the duplicated element

I tried this but it does not work

array - array.uniq

any idea

Regards
Shuaib
--
Posted via http://www.ruby-....

12 Answers

Mohit Sindhwani

10/28/2007 1:16:00 PM

0

Shuaib Zahda wrote:
> Hello
>
> I am trying to output the duplicate elements in an array. I looked into
> the api of ruby I found uniq method which outputs the array with no
> duplication. What i want is to know which elements is duplicated.
> For example
>
> array = ["apple", "banana", "apple", "orange"]
> => ["apple", "banana", "apple", "orange"]
> array.uniq
> => ["apple", "banana", "orange"]
>
> I want the method to tell me that apple is the duplicated element
>
> I tried this but it does not work
>
> array - array.uniq
>
> any idea
>
> Regards
> Shuaib
>

I don't know a good way to do it, but one way to get the result would be
to force it into a hash since that eliminates duplicates.


I'm sure there's a better way to do it, but here's what I got.

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

Cheers
Mohit.




Mohit Sindhwani

10/28/2007 1:16:00 PM

0

Sean O'Halpin wrote:
> On 10/28/07, Shuaib Zahda <shuaib.zahda@gmail.com> wrote:
>
>> Hello
>>
>> I am trying to output the duplicate elements in an array. I looked into
>> the api of ruby I found uniq method which outputs the array with no
>> duplication. What i want is to know which elements is duplicated.
>> For example
>>
>> array = ["apple", "banana", "apple", "orange"]
>> => ["apple", "banana", "apple", "orange"]
>> array.uniq
>> => ["apple", "banana", "orange"]
>>
>> I want the method to tell me that apple is the duplicated element
>>
>> I tried this but it does not work
>>
>> array - array.uniq
>>
>> any idea
>>
>> Regards
>> Shuaib
>> --
>> Posted via http://www.ruby-....
>>
>> Here's one way (I'm sure there must be a simpler approach - just can't
>>
> think of it right now):
>
> array = ["apple", "banana", "apple", "orange"]
> counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item| hash[item]
> += 1; hash}
> p counts #=> {"apple"=>2, "banana"=>1, "orange"=>1}
> p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> ["apple"]
>
>
> Regards,
> Sean
>
>
I so have to get the hang of inject, flatten and map.

Cheers,
Mohit.
10/28/2007 | 9:16 PM.




Robert Klemme

10/28/2007 1:36:00 PM

0

On 28.10.2007 14:16, Mohit Sindhwani wrote:
> Sean O'Halpin wrote:
>> On 10/28/07, Shuaib Zahda <shuaib.zahda@gmail.com> wrote:
>>
>>> Hello
>>>
>>> I am trying to output the duplicate elements in an array. I looked into
>>> the api of ruby I found uniq method which outputs the array with no
>>> duplication. What i want is to know which elements is duplicated.
>>> For example
>>>
>>> array = ["apple", "banana", "apple", "orange"]
>>> => ["apple", "banana", "apple", "orange"]
>>> array.uniq
>>> => ["apple", "banana", "orange"]
>>>
>>> I want the method to tell me that apple is the duplicated element
>>>
>>> I tried this but it does not work
>>>
>>> array - array.uniq
>>>
>>> any idea
>>>
>>> Regards
>>> Shuaib
>>> --
>>> Posted via http://www.ruby-....
>>>
>>> Here's one way (I'm sure there must be a simpler approach - just can't
>>>
>> think of it right now):
>>
>> array = ["apple", "banana", "apple", "orange"]
>> counts = array.inject(Hash.new {|h,k| h[k] = 0 }) { |hash, item|
>> hash[item]
>> += 1; hash}
>> p counts #=> {"apple"=>2, "banana"=>1, "orange"=>1}
>> p counts.select { |k,v| v > 1 }.map{ |k, v| k}.flatten #=> ["apple"]

irb(main):007:0> array = %w{apple banana apple orange}
=> ["apple", "banana", "apple", "orange"]
irb(main):008:0> array.inject(Hash.new(0)) {|ha,e|
ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
=> ["apple"]

Kind regards

robert

Shuaib Zahda

10/28/2007 1:48:00 PM

0

Thanks a lot guys.
It works.

I really appreciate your help

Cheers
Shuaib
--
Posted via http://www.ruby-....

Harry Kakueki

10/28/2007 1:55:00 PM

0

On 10/28/07, Shuaib Zahda <shuaib.zahda@gmail.com> wrote:
> Hello
>
> I am trying to output the duplicate elements in an array. I looked into
> the api of ruby I found uniq method which outputs the array with no
> duplication. What i want is to know which elements is duplicated.
> For example
>
> array = ["apple", "banana", "apple", "orange"]
> => ["apple", "banana", "apple", "orange"]
> array.uniq
> => ["apple", "banana", "orange"]
>
> I want the method to tell me that apple is the duplicated element
>
> I tried this but it does not work
>
> array - array.uniq
>
> any idea
>
> Regards
> Shuaib
> --
> Posted via http://www.ruby-....
>
>

arr,dup = ["apple", "banana", "apple", "orange"],[]
(arr.length-1).times do
a = arr.shift
dup << a if arr.include?(a)
end
p dup.uniq

Harry

--
A Look into Japanese Ruby List in English
http://www.ka...

Sean O'Halpin

10/28/2007 5:45:00 PM

0

On 10/28/07, Mohit Sindhwani <mo_mail@onghu.com> wrote:
> I so have to get the hang of inject, flatten and map.
>
> Cheers,
> Mohit.
> 10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
seen = { }
duplicates = { }
array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
duplicates.keys
end

# Robert Klemme
def duplicates_2(array)
array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
end

# from facets
def duplicates_3(array)
array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require 'benchmark'

def do_benchmark(title, n, methods, *args, &block)
puts '-' * 40
puts title
puts '-' * 40
Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
methods.each do |meth|
x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
end
end
end

# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)

# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

__END__
$ ruby bm-duplicates.rb
----------------------------------------
no duplicates
----------------------------------------
user system total real
duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
----------------------------------------
duplicates
----------------------------------------
user system total real
duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)

Mohit Sindhwani

10/28/2007 6:15:00 PM

0

Sean O'Halpin wrote:
> On 10/28/07, Mohit Sindhwani <mo_mail@onghu.com> wrote:
>
>> I so have to get the hang of inject, flatten and map.
>>
>> Cheers,
>> Mohit.
>> 10/28/2007 | 9:16 PM.
>>
>
> Hi,
>
> They are definitely worth looking into - inject in particular is a
> powerful tool (Robert Klemme can make it do anything!). However, the
> following benchmark shows that a slight modification of your approach
> is actually pretty efficient. (The modification is to store the
> duplicates in a hash rather than an array so you can return the list
> of duplicates using Hash#keys).
>
> Regards,
> Sean
>
> # Mohit Sindhwani (with slight adjustment)
> def duplicates_1(array)
> seen = { }
> duplicates = { }
> array.each {|item| seen.key?(item) ? duplicates[item] = true :
> seen[item] = true}
> duplicates.keys
> end
>
> # Robert Klemme
> def duplicates_2(array)
> array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
> end
>
> # from facets
> def duplicates_3(array)
> array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
> end
>
> require 'benchmark'
>
> def do_benchmark(title, n, methods, *args, &block)
> puts '-' * 40
> puts title
> puts '-' * 40
> Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
> methods.each do |meth|
> x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
> end
> end
> end
>
> # get some data (Ubuntu specific I guess - YMMV)
> array = File.read('/etc/dictionaries-common/words').split(/\n/)
>
> # test w/o dups
> do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> # create some duplicates
> array = array[0..999] * 100
> do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> __END__
> $ ruby bm-duplicates.rb
> ----------------------------------------
> no duplicates
> ----------------------------------------
> user system total real
> duplicates_1 2.200000 0.010000 2.210000 ( 2.215057)
> duplicates_2 5.820000 0.000000 5.820000 ( 5.812414)
> duplicates_3 6.580000 0.010000 6.590000 ( 6.586708)
> ----------------------------------------
> duplicates
> ----------------------------------------
> user system total real
> duplicates_1 1.560000 0.000000 1.560000 ( 1.562587)
> duplicates_2 2.660000 0.000000 2.660000 ( 2.665301)
> duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)
>
>
>
>

Thanks Sean! Makes me feel quite nice about it.

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.




Sean O'Halpin

10/28/2007 6:47:00 PM

0

On 10/28/07, Mohit Sindhwani <mo_mail@onghu.com> wrote:
>
> Thanks Sean! Makes me feel quite nice about it.
>
> So, hashes are faster than arrays?
>
> Cheers,
> Mohit.
> 10/29/2007 | 2:13 AM.

It depends what you're doing with them and how big they are. But in
this instance, I changed your solution to use a hash because you were
appending the duplicates to an array which resulted in adding an entry
to that array every time you detected a duplicate. This didn't show up
in your example because your data contained at most two instances of
an item. If you change your example to:

array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
"apple", "apple"]
h = Hash.new
duplicates = []

array.each {|item|
if h.has_key?(item) then
duplicates << item
else
h[item] = 0 #it doesn't matter what we store
end
}

puts duplicates

it outputs

apple
cow
apple
apple

which is probably not what you want.

Regards,
Sean

Jimmy Kofler

10/28/2007 9:32:00 PM

0

> Duplicate elements in array
> Posted by Shuaib Zahda (shuaib85) on 28.10.2007 13:47
> Hello
>
> I am trying to output the duplicate elements in an array. I looked into
> the api of ruby I found uniq method which outputs the array with no
> duplication. What i want is to know which elements is duplicated.

Here's yet another way to do it:
http://snippets.dzone.com/posts...

Cheers,

j.k.
--
Posted via http://www.ruby-....

Mohit Sindhwani

10/29/2007 3:44:00 AM

0

Sean O'Halpin wrote:
> On 10/28/07, Mohit Sindhwani <mo_mail@onghu.com> wrote:
>
>> Thanks Sean! Makes me feel quite nice about it.
>>
>> So, hashes are faster than arrays?
>>
>> Cheers,
>> Mohit.
>> 10/29/2007 | 2:13 AM.
>>
>
> It depends what you're doing with them and how big they are. But in
> this instance, I changed your solution to use a hash because you were
> appending the duplicates to an array which resulted in adding an entry
> to that array every time you detected a duplicate. This didn't show up
> in your example because your data contained at most two instances of
> an item. If you change your example to:
>
> array = ["apple", "banana", "apple", "orange", "fat", "cow", "cow",
> "apple", "apple"]
> h = Hash.new
> duplicates = []
>
> array.each {|item|
> if h.has_key?(item) then
> duplicates << item
> else
> h[item] = 0 #it doesn't matter what we store
> end
> }
>
> puts duplicates
>
> it outputs
>
> apple
> cow
> apple
> apple
>
> which is probably not what you want.
>
> Regards,
> Sean
>
>
Thanks for the explanation, Sean. Actually, I guess it's not clear if
the OP wants to know each occurrence of the duplicates or just the list
of duplicates. But, there are now solutions for both cases!

Cheers,
Mohit.
10/29/2007 | 11:44 AM.