[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Removing duplicates and substrings from an array

Sammy Larbi

11/26/2007 3:15:00 PM

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

I've got an array of strings, say like:

["Bob", "John", "Bobby", "John"]

I want to remove duplicates and elements that are substrings of other
elements. Therefore, the above array would become:

["John","Bobby"]

(order doesn't really matter to me, BTW)

Right now, this is what I'm doing:

def remove_duplicates_and_subsequences(some_array)
result = []
some_array.each_index do |i|
(some_array.length-1).downto 0 do |j|
some_array.delete_at(j) if i != j &&
some_array[i].index(some_array[j])
end
end
return result
end

Is there a better way to do that? I feel like I should be using select or
reject, but can't think of a way to do it.

Thanks,
Sammy Larbi

11 Answers

Siep Korteling

11/26/2007 3:21:00 PM

0

Sam Larbi wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>

["Bob", "John", "Bobby", "John"].uniq!

(or uniq )
--
Posted via http://www.ruby-....

Shairon Toledo

11/26/2007 3:22:00 PM

0

Sam Larbi wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select
> or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi


You tried to use the method uniq?
<code>
[1,2,3,4,1,3].uniq => [1,2,3,4]
</code>
--
Posted via http://www.ruby-....

Christian von Kleist

11/26/2007 3:28:00 PM

0

On Nov 26, 2007 10:15 AM, Sam Larbi <slarbi@gmail.com> wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi
>


You can use Array.uniq to remove duplicates. For removing words that
are contained in other words, I would sort the array, then for each
string in the array:

good_strings = []
0.upto(good_strings.length - 2) do |i|
good_strings << strings[i] unless strings[i + 1].include?(strings[i])
end

...or something like that.

Marc Heiler

11/26/2007 3:38:00 PM

0

I think there could also be a .map solution but I cant figure it out
right now, .uniq just really seems the most simple and elegant for this
given problem at hand
--
Posted via http://www.ruby-....

yermej

11/26/2007 5:12:00 PM

0

On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi

This should work:

arr = ["Bob", "John", "Bobby", "John"]
arr.uniq!
arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}

Jeremy

Lionel Bouton

11/26/2007 5:20:00 PM

0

yermej wrote the following on 26.11.2007 18:15 :
> On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>
>> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>>
>> I've got an array of strings, say like:
>>
>> ["Bob", "John", "Bobby", "John"]
>>
>> I want to remove duplicates and elements that are substrings of other
>> elements. Therefore, the above array would become:
>>
>> ["John","Bobby"]
> This should work:
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>

You'll have surprises if there's a "." element...

arr = ["Bob", "John", "Bobby", "John"]
arr.uniq!
arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }


seems safer and quicker to me.

Lionel

Lionel Bouton

11/26/2007 5:25:00 PM

0

Lionel Bouton wrote the following on 26.11.2007 18:20 :
> yermej wrote the following on 26.11.2007 18:15 :
>
>> On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>>
>>
>>> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>>>
>>> I've got an array of strings, say like:
>>>
>>> ["Bob", "John", "Bobby", "John"]
>>>
>>> I want to remove duplicates and elements that are substrings of other
>>> elements. Therefore, the above array would become:
>>>
>>> ["John","Bobby"]
>>>
>> This should work:
>>
>> arr = ["Bob", "John", "Bobby", "John"]
>> arr.uniq!
>> arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>>
>>
>
> You'll have surprises if there's a "." element...
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>

Oups: I misread the question.

It should be b.index(a) (I rejected the superstrings instead of the
substrings).

Lionel

Sebastian Hungerecker

11/26/2007 5:27:00 PM

0

Lionel Bouton wrote:
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }

I'd make that index into include? because you don't really care about the
index here.


--
Jabber: sepp2k@jabber.org
ICQ: 205544826

yermej

11/26/2007 5:39:00 PM

0

On Nov 26, 11:20 am, Lionel Bouton <lionel-subscript...@bouton.name>
wrote:
> yermej wrote the following on 26.11.2007 18:15 :
>
>
>
> > On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>
> >> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>
> >> I've got an array of strings, say like:
>
> >> ["Bob", "John", "Bobby", "John"]
>
> >> I want to remove duplicates and elements that are substrings of other
> >> elements. Therefore, the above array would become:
>
> >> ["John","Bobby"]
> > This should work:
>
> > arr = ["Bob", "John", "Bobby", "John"]
> > arr.uniq!
> > arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>
> You'll have surprises if there's a "." element...
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>
> seems safer and quicker to me.
>
> Lionel

Good point. Thank you.

Jeremy

Lionel Bouton

11/26/2007 5:51:00 PM

0

Sebastian Hungerecker wrote the following on 26.11.2007 18:27 :
> Lionel Bouton wrote:
>
>> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>>
>
> I'd make that index into include? because you don't really care about the
> index here.
>

I agree, the code is then easier to read too.

Lionel