Asp Forum - Removing duplicates and substrings from an array

Sammy Larbi

11/26/2007 3:15:00 PM

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

I've got an array of strings, say like:

["Bob", "John", "Bobby", "John"]

I want to remove duplicates and elements that are substrings of other
elements. Therefore, the above array would become:

["John","Bobby"]

(order doesn't really matter to me, BTW)

Right now, this is what I'm doing:

def remove_duplicates_and_subsequences(some_array)
result = []
some_array.each_index do |i|
(some_array.length-1).downto 0 do |j|
some_array.delete_at(j) if i != j &&
some_array[i].index(some_array[j])
end
end
return result
end

Is there a better way to do that? I feel like I should be using select or
reject, but can't think of a way to do it.

Thanks,
Sammy Larbi

11 Answers

Siep Korteling

11/26/2007 3:21:00 PM

Sam Larbi wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>

["Bob", "John", "Bobby", "John"].uniq!

(or uniq )
--
Posted via http://www.ruby-....

Shairon Toledo

11/26/2007 3:22:00 PM

Sam Larbi wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select
> or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi

You tried to use the method uniq?
<code>
[1,2,3,4,1,3].uniq => [1,2,3,4]
</code>
--
Posted via http://www.ruby-....

Christian von Kleist

11/26/2007 3:28:00 PM

On Nov 26, 2007 10:15 AM, Sam Larbi <slarbi@gmail.com> wrote:
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi
>

You can use Array.uniq to remove duplicates. For removing words that
are contained in other words, I would sort the array, then for each
string in the array:

good_strings = []
0.upto(good_strings.length - 2) do |i|
good_strings << strings[i] unless strings[i + 1].include?(strings[i])
end

...or something like that.

Marc Heiler

11/26/2007 3:38:00 PM

I think there could also be a .map solution but I cant figure it out
right now, .uniq just really seems the most simple and elegant for this
given problem at hand
--
Posted via http://www.ruby-....

yermej

11/26/2007 5:12:00 PM

On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>
> I've got an array of strings, say like:
>
> ["Bob", "John", "Bobby", "John"]
>
> I want to remove duplicates and elements that are substrings of other
> elements. Therefore, the above array would become:
>
> ["John","Bobby"]
>
> (order doesn't really matter to me, BTW)
>
> Right now, this is what I'm doing:
>
> def remove_duplicates_and_subsequences(some_array)
> result = []
> some_array.each_index do |i|
> (some_array.length-1).downto 0 do |j|
> some_array.delete_at(j) if i != j &&
> some_array[i].index(some_array[j])
> end
> end
> return result
> end
>
> Is there a better way to do that? I feel like I should be using select or
> reject, but can't think of a way to do it.
>
> Thanks,
> Sammy Larbi

This should work:

arr = ["Bob", "John", "Bobby", "John"]
arr.uniq!
arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}

Jeremy

Lionel Bouton

11/26/2007 5:20:00 PM

yermej wrote the following on 26.11.2007 18:15 :
> On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>
>> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>>
>> I've got an array of strings, say like:
>>
>> ["Bob", "John", "Bobby", "John"]
>>
>> I want to remove duplicates and elements that are substrings of other
>> elements. Therefore, the above array would become:
>>
>> ["John","Bobby"]
> This should work:
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>

You'll have surprises if there's a "." element...

arr = ["Bob", "John", "Bobby", "John"]
arr.uniq!
arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }

seems safer and quicker to me.

Lionel

Lionel Bouton

11/26/2007 5:25:00 PM

Lionel Bouton wrote the following on 26.11.2007 18:20 :
> yermej wrote the following on 26.11.2007 18:15 :
>
>> On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>>
>>
>>> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>>>
>>> I've got an array of strings, say like:
>>>
>>> ["Bob", "John", "Bobby", "John"]
>>>
>>> I want to remove duplicates and elements that are substrings of other
>>> elements. Therefore, the above array would become:
>>>
>>> ["John","Bobby"]
>>>
>> This should work:
>>
>> arr = ["Bob", "John", "Bobby", "John"]
>> arr.uniq!
>> arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>>
>>
>
> You'll have surprises if there's a "." element...
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>

Oups: I misread the question.

It should be b.index(a) (I rejected the superstrings instead of the
substrings).

Lionel

Sebastian Hungerecker

11/26/2007 5:27:00 PM

Lionel Bouton wrote:
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }

I'd make that index into include? because you don't really care about the
index here.

--
Jabber: sepp2k@jabber.org
ICQ: 205544826

yermej

11/26/2007 5:39:00 PM

On Nov 26, 11:20 am, Lionel Bouton <lionel-subscript...@bouton.name>
wrote:
> yermej wrote the following on 26.11.2007 18:15 :
>
>
>
> > On Nov 26, 9:15 am, Sam Larbi <sla...@gmail.com> wrote:
>
> >> Note: parts of this message were removed by the gateway to make it a legal Usenet post.
>
> >> I've got an array of strings, say like:
>
> >> ["Bob", "John", "Bobby", "John"]
>
> >> I want to remove duplicates and elements that are substrings of other
> >> elements. Therefore, the above array would become:
>
> >> ["John","Bobby"]
> > This should work:
>
> > arr = ["Bob", "John", "Bobby", "John"]
> > arr.uniq!
> > arr.reject {|a| arr.any? {|b| b != a and b =~ /#{a}/}}
>
> You'll have surprises if there's a "." element...
>
> arr = ["Bob", "John", "Bobby", "John"]
> arr.uniq!
> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>
> seems safer and quicker to me.
>
> Lionel

Good point. Thank you.

Jeremy

Lionel Bouton

11/26/2007 5:51:00 PM

Sebastian Hungerecker wrote the following on 26.11.2007 18:27 :
> Lionel Bouton wrote:
>
>> arr.reject {|a| arr.any? {|b| b != a and a.index(b) } }
>>
>
> I'd make that index into include? because you don't really care about the
> index here.
>

I agree, the code is then easier to read too.

Lionel

comp.lang.ruby

Removing duplicates and substrings from an array

Sammy Larbi

Siep Korteling

Shairon Toledo

Christian von Kleist

Marc Heiler

yermej

Lionel Bouton

Lionel Bouton

Sebastian Hungerecker

yermej

Lionel Bouton

x Login to ForumsZone