Asp Forum - using reg expr with array.index

Esmail

12/26/2007 10:33:00 PM

If I have an array ar of strings that contains
for instance

aaaa
bbbb
cccc
>dddd
eeee
dddd
cccc
etc.

is there a way to use ar.index with a regular
expression to get the index of the line >dddd

I've tried ar.index(/^>/) and (/^\>/) without
much luck.

In other words, I'm trying to match on the first
character which is a >

Thanks.

23 Answers

MonkeeSage

12/27/2007 2:25:00 AM

On Dec 26, 4:32 pm, Esmail <ebonak_de...@hotmail.com> wrote:
> If I have an array ar of strings that contains
> for instance
>
> aaaa
> bbbb
> cccc
> >dddd
> eeee
> dddd
> cccc
> etc.
>
> is there a way to use ar.index with a regular
> expression to get the index of the line >dddd
>
> I've tried ar.index(/^>/) and (/^\>/) without
> much luck.
>
> In other words, I'm trying to match on the first
> character which is a >
>
> Thanks.

['aaaa', '>bbbb', 'cccc'].find { | e | e =~ /^>/ }

Regards,
Jordan

Esmail

12/27/2007 4:20:00 AM

MonkeeSage wrote:
> On Dec 26, 4:32 pm, Esmail <ebonak_de...@hotmail.com> wrote:
>> If I have an array ar of strings that contains
>> for instance
>>
>> aaaa
>> bbbb
>> cccc
>> >dddd
>> eeee
>> dddd
>> cccc
>> etc.
>>
>> is there a way to use ar.index with a regular
>> expression to get the index of the line >dddd
>>
>> I've tried ar.index(/^>/) and (/^\>/) without
>> much luck.
>>
>> In other words, I'm trying to match on the first
>> character which is a >
>>
>> Thanks.
>
> ['aaaa', '>bbbb', 'cccc'].find { | e | e =~ /^>/ }
>

Hi Jordan,

Is there a way to use this regular expression to return the
index value of the position where this string is found? That
is the main thing I am interested in.

It seems there ought to be an easy way ('cept I don't know it :-)

Esmail

MonkeeSage

12/27/2007 5:00:00 AM

On Dec 26, 10:20 pm, Esmail <ebonak_de...@hotmail.com> wrote:
> MonkeeSage wrote:
> > On Dec 26, 4:32 pm, Esmail <ebonak_de...@hotmail.com> wrote:
> >> If I have an array ar of strings that contains
> >> for instance
>
> >> aaaa
> >> bbbb
> >> cccc
> >> >dddd
> >> eeee
> >> dddd
> >> cccc
> >> etc.
>
> >> is there a way to use ar.index with a regular
> >> expression to get the index of the line >dddd
>
> >> I've tried ar.index(/^>/) and (/^\>/) without
> >> much luck.
>
> >> In other words, I'm trying to match on the first
> >> character which is a >
>
> >> Thanks.
>
> > ['aaaa', '>bbbb', 'cccc'].find { | e | e =~ /^>/ }
>
> Hi Jordan,
>
> Is there a way to use this regular expression to return the
> index value of the position where this string is found? That
> is the main thing I am interested in.
>
> It seems there ought to be an easy way ('cept I don't know it :-)
>
> Esmail

Hi,

There's no built-in way that I'm aware of. You have to iterate over
the array yourself. If you want all the indices you could something
like...

indices = []
['aaaa', '>bbbb', '>cccc'].each_with_index { | e, i |
indices << i if e =~ /^>/
}
p indices # => [1, 2]

But given the description of what you're trying to do in the other
thread, you probably just want to use Array#reject...

a = ['aaaa', '>bbbb', 'cccc'].reject { | e | e =~ /^>/ }
p a # => ["aaaa", "cccc"]

Regards,
Jordan

Esmail

12/27/2007 1:18:00 PM

Hi Jordan,

I didn't know about each_with_index until after I posted my last
message and read more on Ruby .. clearly I have to do more reading,
but I have found one of the best ways to learn is to do :-)

> There's no built-in way that I'm aware of. You have to iterate over
> the array yourself. If you want all the indices you could something
> like...
>
> indices = []
> ['aaaa', '>bbbb', '>cccc'].each_with_index { | e, i |
> indices << i if e =~ /^>/
> }
> p indices # => [1, 2]
>
> But given the description of what you're trying to do in the other
> thread, you probably just want to use Array#reject...
>
> a = ['aaaa', '>bbbb', 'cccc'].reject { | e | e =~ /^>/ }
> p a # => ["aaaa", "cccc"]

This would delete only the one element, but I am trying to delete a range
of data (a record). I may have duplicate records, so I am trying to get
rid of them. They have different identifiers, each starting with a '>'.
Here's a test file that mimics this:

>88888/Bla08/the/rest8
888888888888888
888888888888888
888888888888888
888888888888888
888888888888888
88888 -- last line --
>77777/Bla07/the/rest7
777777777777777
777777777777777
777777777777777
777777777777777
777777777777777
77777 -- last line --
>66666/Bla06/the/rest6
666666666666666
666666666666666
666666666666666
666666666666666
666666666666666
66666 -- last line --
>77777/Bla07/the/rest7
777777777777777
777777777777777
777777777777777
777777777777777
777777777777777
77777 -- last line --
>

(I add the last > and later remove it)

So, this is what I came up with (with suggestions from you):

######################################
# delete duplicate records
######################################
def deleteDuplicates(data, dups)

dups.each do |name|
puts "\n****deleting duplicate \"#{name}\"...\n"
s = data.index(name)
e = 0
data[s+1..-1].each_with_index{ |v, i|
if v =~ /^>/
e = i
break
end
}

puts "deleting ... ", data[s..s+e], "..done"
data.slice!(s..s+e)
end

data
end
######################################

What do you think? It seems to work, but I'm always interested in
learning to do things better.

Thanks again!

Esmail

Robert Klemme

12/27/2007 2:23:00 PM

2007/12/27, MonkeeSage <MonkeeSage@gmail.com>:
> On Dec 26, 10:20 pm, Esmail <ebonak_de...@hotmail.com> wrote:
> > MonkeeSage wrote:
> > > On Dec 26, 4:32 pm, Esmail <ebonak_de...@hotmail.com> wrote:
> > >> If I have an array ar of strings that contains
> > >> for instance
> >
> > >> aaaa
> > >> bbbb
> > >> cccc
> > >> >dddd
> > >> eeee
> > >> dddd
> > >> cccc
> > >> etc.
> >
> > >> is there a way to use ar.index with a regular
> > >> expression to get the index of the line >dddd
> >
> > >> I've tried ar.index(/^>/) and (/^\>/) without
> > >> much luck.
> >
> > >> In other words, I'm trying to match on the first
> > >> character which is a >
> >
> > >> Thanks.
> >
> > > ['aaaa', '>bbbb', 'cccc'].find { | e | e =~ /^>/ }
> >
> > Hi Jordan,
> >
> > Is there a way to use this regular expression to return the
> > index value of the position where this string is found? That
> > is the main thing I am interested in.
> >
> > It seems there ought to be an easy way ('cept I don't know it :-)
> >
> > Esmail
>
> Hi,
>
> There's no built-in way that I'm aware of.

How about this one - kind of pseudo built in. :-)

irb(main):007:0> a=['aaaa', '>bbbb', 'cccc']
=> ["aaaa", ">bbbb", "cccc"]
irb(main):008:0> a.to_enum(:each_with_index).find {|e,i| /^>/ =~ e}.last
=> 1

A similar approach also works when looking for multiple indexes:

irb(main):009:0> a.to_enum(:each_with_index).select {|e,i| /^>|c+/ =~
e}.map {|e,i| i}
=> [1, 2]

But I agree, usually indexes are fairly seldom needed with Arrays.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

MonkeeSage

12/27/2007 2:45:00 PM

On Dec 27, 7:17 am, Esmail <ebonak_de...@hotmail.com> wrote:
> Hi Jordan,
>
> I didn't know about each_with_index until after I posted my last
> message and read more on Ruby .. clearly I have to do more reading,
> but I have found one of the best ways to learn is to do :-)
>
> > There's no built-in way that I'm aware of. You have to iterate over
> > the array yourself. If you want all the indices you could something
> > like...
>
> > indices = []
> > ['aaaa', '>bbbb', '>cccc'].each_with_index { | e, i |
> > indices << i if e =~ /^>/
> > }
> > p indices # => [1, 2]
>
> > But given the description of what you're trying to do in the other
> > thread, you probably just want to use Array#reject...
>
> > a = ['aaaa', '>bbbb', 'cccc'].reject { | e | e =~ /^>/ }
> > p a # => ["aaaa", "cccc"]
>
> This would delete only the one element, but I am trying to delete a range
> of data (a record). I may have duplicate records, so I am trying to get
> rid of them. They have different identifiers, each starting with a '>'.
> Here's a test file that mimics this:
>
> >88888/Bla08/the/rest8
> 888888888888888
> 888888888888888
> 888888888888888
> 888888888888888
> 888888888888888
> 88888 -- last line --
> >77777/Bla07/the/rest7
> 777777777777777
> 777777777777777
> 777777777777777
> 777777777777777
> 777777777777777
> 77777 -- last line --
> >66666/Bla06/the/rest6
> 666666666666666
> 666666666666666
> 666666666666666
> 666666666666666
> 666666666666666
> 66666 -- last line --
> >77777/Bla07/the/rest7
> 777777777777777
> 777777777777777
> 777777777777777
> 777777777777777
> 777777777777777
> 77777 -- last line --
> >
>
> (I add the last > and later remove it)
>
> So, this is what I came up with (with suggestions from you):
>
> ######################################
> # delete duplicate records
> ######################################
> def deleteDuplicates(data, dups)
>
> dups.each do |name|
> puts "\n****deleting duplicate \"#{name}\"...\n"
> s = data.index(name)
> e = 0
> data[s+1..-1].each_with_index{ |v, i|
> if v =~ /^>/
> e = i
> break
> end
> }
>
> puts "deleting ... ", data[s..s+e], "..done"
> data.slice!(s..s+e)
> end
>
> data
> end
> ######################################
>
> What do you think? It seems to work, but I'm always interested in
> learning to do things better.
>
> Thanks again!
>
> Esmail

Hi Esmail,

A couple points:

- It's not very efficient to do all that iteration and slicing.

- The regexp won't work since #each and #each_with_index iterate over
lines and not characters (so v == " >...", so /^ >/ would be needed).

- #index returns nil if there is no matching index (error when you get
to s+1 in that case).

How about using Array#uniq, as in:

def no_dups(path)
IO.read(path).split(" >").uniq.join(" >")
end
fixed = no_dups("testfile")
puts fixed

# =>
>88888/Bla08/the/rest8
888888888888888
888888888888888
888888888888888
888888888888888
888888888888888
88888 -- last line --
>77777/Bla07/the/rest7
777777777777777
777777777777777
777777777777777
777777777777777
777777777777777
77777 -- last line --
>66666/Bla06/the/rest6
666666666666666
666666666666666
666666666666666
666666666666666
666666666666666
66666 -- last line --
>

Regards,
Jordan

Esmail

12/27/2007 11:01:00 PM

Hello again!

MonkeeSage wrote:
>
> Hi Esmail,
>
> A couple points:
>
> - It's not very efficient to do all that iteration and slicing.

Yes, I was afraid of this -- but this is the sort of thing I am
trying to learn. Ruby is rather high level compared to C and Java
so I need to adjust my approaches a little.

Knowing different ways of solving the same problem is useful.

> - The regexp won't work since #each and #each_with_index iterate over
> lines and not characters (so v == " >...", so /^ >/ would be needed).

Oops, my mistake, not sure how that got pasted like that, those > markers
do really start in the the first column.

> - #index returns nil if there is no matching index (error when you get
> to s+1 in that case).
>
> How about using Array#uniq, as in:
>
> def no_dups(path)
> IO.read(path).split(" >").uniq.join(" >")
> end
> fixed = no_dups("testfile")
> puts fixed

Looks good, I've come across those methods before, but I have to
read up on the specific methods again.

Thanks for sharing your knowledge,

Esmail

Esmail

12/27/2007 11:02:00 PM

Robert Klemme wrote:
>
> How about this one - kind of pseudo built in. :-)
>
> irb(main):007:0> a=['aaaa', '>bbbb', 'cccc']
> => ["aaaa", ">bbbb", "cccc"]
> irb(main):008:0> a.to_enum(:each_with_index).find {|e,i| /^>/ =~ e}.last
> => 1
>
> A similar approach also works when looking for multiple indexes:
>
> irb(main):009:0> a.to_enum(:each_with_index).select {|e,i| /^>|c+/ =~
> e}.map {|e,i| i}
> => [1, 2]
>
> But I agree, usually indexes are fairly seldom needed with Arrays.

Hi there Robert,

Thanks for posting this, I'll have to dissect it carefully to see
what it exactly does. (Which is cool, more new stuff to learn :-)

Esmail

Esmail

12/28/2007 12:40:00 AM

> How about using Array#uniq, as in:
>
> def no_dups(path)
> IO.read(path).split(" >").uniq.join(" >")
> end
> fixed = no_dups("testfile")
> puts fixed

This is *neat* and elegant! Very cool .... thanks again.

Esmail

12/28/2007 1:18:00 AM

MonkeeSage wrote:
>
> def no_dups(path)
> IO.read(path).split(" >").uniq.join(" >")
> end
> fixed = no_dups("testfile")
> puts fixed

One more quick questions (ha .. see, that's what you get
for being so helpful) - please feel free to ignore this.

For the above solution which I really like, is there an
easy way to get the duplicate records? (I'd like to display
the name lines ie the ones that start with > as a possible
check of what I am eliminating from the original data).

I know how to do this if I reread the file again and traverse
it but that's certainly not an efficient way to do this.

I am doing a lot of reading on Ruby right now, so I'll
may come across the solution, so only reply if you are bored :-)

(ps: I suppose if there was a to_set and to_array functionality
in Ruby - for all I know there is - it would have yet provided
another approach to solve the original problem)

comp.lang.ruby

using reg expr with array.index

Esmail

MonkeeSage

Esmail

MonkeeSage

Esmail

Robert Klemme

MonkeeSage

Esmail

Esmail

Esmail

Esmail

x Login to ForumsZone