Asp Forum - Index of multiple similar strings

Milo Thurston

10/7/2004 12:16:00 PM

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/ex...
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

--
www.sirwilliamhope.org

11 Answers

Robert Klemme

10/7/2004 12:28:00 PM

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3c16$699$1@news.ox.ac.uk...
> I'm trying to read through a file like this:
> http://www.genomics.ceh.ac.uk/~milo/ex...
> In order to count the number of N tracts and locate their
> positions. My code goes like this:
>
> dust_seq = # file in url above
> nums = 0
> d.dust_seq.scan(/[N]+/) do |blah|
> nums += 1
> puts "Index #{d.dust_seq.index(blah.to_s)}"
> done
> puts "Num of Ns: #{nums}"
>
> In the example, the index for the third of the N groups
> is reported as the same as the first, as it's small enough
> to fit within it. Is there any way around this?
> Thanks.
>
> --
> www.sirwilliamhope.org

dust_seq = # file in url above
nums = 0
dust_seq.scan(/N+/) do |blah|
nums += 1
puts "Index #{$`.length}"
end
puts "Num of Ns: #{nums}"

Kind regards

robert

Milo Thurston

10/7/2004 12:41:00 PM

Robert Klemme <bob.news@gmx.net> wrote:
> puts "Index #{$`.length}"

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

--
www.sirwilliamhope.org

Carlos

10/7/2004 12:44:00 PM

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 14.19 CEST]
> I'm trying to read through a file like this:
> http://www.genomics.ceh.ac.uk/~milo/ex...
> In order to count the number of N tracts and locate their
> positions. My code goes like this:
>
> dust_seq = # file in url above
> nums = 0
> d.dust_seq.scan(/[N]+/) do |blah|
> nums += 1
> puts "Index #{d.dust_seq.index(blah.to_s)}"
> done
> puts "Num of Ns: #{nums}"
>
> In the example, the index for the third of the N groups
> is reported as the same as the first, as it's small enough
> to fit within it. Is there any way around this?

(not tested):

nums = 0
idx = 0

while idx = dust_seq.index /N+/, idx
nums += 1
puts "Index #{idx}"
idx = Regexp.last_match.end(0)+1
end
puts "Num of Ns: #{nums}"

Milo Thurston

10/7/2004 1:00:00 PM

Carlos <angus@quovadis.com.ar> wrote:
> while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
(I now regret not compiling in an kernel OOM killer...).
--
www.sirwilliamhope.org

Robert Klemme

10/7/2004 1:01:00 PM

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3dg6$6sr$1@news.ox.ac.uk...
> Robert Klemme <bob.news@gmx.net> wrote:
> > puts "Index #{$`.length}"
>
> Excellent, thanks.
> In which book/manual is $` described? I've not seen it before.

It's in the Pickaxe (both versions) although not in the online version of
the first edition AFAIK. You can find about the other way in the Regexp
doc:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/ref_c_r...
http://www.ruby-doc.org/core/classes/R...

Kind regards

robert

Robert Klemme

10/7/2004 1:12:00 PM

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3el8$7dm$1@news.ox.ac.uk...
> Carlos <angus@quovadis.com.ar> wrote:
> > while idx = dust_seq.index /N+/, idx
>
> Thanks - the interpreter didn't like this line, though.
> However, I got it working and it seems better than the $`
> method, which caused some nasty memory hogging problems
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

robert

Carlos

10/7/2004 1:21:00 PM

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 15.04 CEST]
> > while idx = dust_seq.index /N+/, idx
>
> Thanks - the interpreter didn't like this line, though.

You are right, it should have parens:

while idx = dust_seq.index(/N+/, idx)

Strange...

ts

10/7/2004 1:24:00 PM

>>>>> "R" == Robert Klemme <bob.news@gmx.net> writes:

>> method, which caused some nasty memory hogging problems
R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
R> ??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

Guy Decoux

Robert Klemme

10/7/2004 1:25:00 PM

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200410071323.i97DNZR00289@moulon.inra.fr...
> >>>>> "R" == Robert Klemme <bob.news@gmx.net> writes:
>
> >> method, which caused some nasty memory hogging problems
> R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> R> ??? Care to explain?
>
> You create a String object for each call, you don't have this problem
with
>
> $~.begin(0)

True. I thought of $~ also, but oversaw this aspect - "$`.length" just
looked cuter. :-) Thx.

robert

Milo Thurston

10/7/2004 1:38:00 PM

ts <decoux@moulon.inra.fr> wrote:
> R> ??? Care to explain?
> You create a String object for each call, you don't have this problem with
> $~.begin(0)

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

--
www.sirwilliamhope.org

comp.lang.ruby

Index of multiple similar strings

Milo Thurston

Robert Klemme

Milo Thurston

Carlos

Milo Thurston

Robert Klemme

Robert Klemme

Carlos

ts

Robert Klemme

Milo Thurston

x Login to ForumsZone