[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

identify and extract positions from a string - how to?

Marc Hoeppner

7/19/2007 10:31:00 AM

Hi,

I am not quite sure about how to approach the following problem:

I have a long (long long long) string of letters, a genomic sequence
(600k characters+).
Now, what I want to do is to extract certain parts of this string, based
on the position.
So for example lets say I want all characters from position 2340 to
5436.

A quick pointer in the right direction would be much appreciated. I have
a vague idea that it could perhaps be done with count? Like "puts string
where string.count("actg")=2340 until string.count("actg")=5436"... ?
Not sure tho, and probably there are better ways.



Cheers,

Marc

--
Posted via http://www.ruby-....

9 Answers

Thomas Worm

7/19/2007 10:52:00 AM

0

On Thu, 19 Jul 2007 19:31:12 +0900, Marc Hoeppner wrote:

> Hi,
>
> I am not quite sure about how to approach the following problem:
>
> I have a long (long long long) string of letters, a genomic sequence
> (600k characters+).
> Now, what I want to do is to extract certain parts of this string, based
> on the position.
> So for example lets say I want all characters from position 2340 to
> 5436.

What about

puts "My String"[5..7]

Thomas

F. Senault

7/19/2007 10:57:00 AM

0

Le 19 juillet à 12:31, Marc Hoeppner a écrit :

> Hi,
>
> I am not quite sure about how to approach the following problem:
>
> I have a long (long long long) string of letters, a genomic sequence
> (600k characters+).
> Now, what I want to do is to extract certain parts of this string, based
> on the position.
> So for example lets say I want all characters from position 2340 to
> 5436.

For example :

>> str = "abcdefghijklmnopqrstuvwxyz"
=> "abcdefghijklmnopqrstuvwxyz"

The simplest way to do answer you question is :

>> str[5..11]
=> "fghijkl"

You may want to try the other variants :

>> str[5, 6]
=> "fghijkl"

>> str[/f.*l/]
=> "fghijkl"

>> str['jghijkl']
=> "fghijkl"

If you need to parse it char per char, you can use a multitude of
methods :

>> str[5..10].each_byte { |b| puts b.chr }
f
g
h
i
j
k
=> "fghijk"

>> str[5..10].split(//)
=> ["f", "g", "h", "i", "j", "k"]

>> str[5..10].split(//).each { |c| puts c }
f
g
h
i
j
k
=> ["f", "g", "h", "i", "j", "k"]

Etc.

I didn't try with very long strings, now, but I don't see why the ranges
methods of access wouldn't be acceptable. (Of course, the regular
expression will be slower.)

Fred
--
I can try to get away but i've strapped myself in
I can try to scratch away the sound in my ears
I can see it killing away all my bad parts (Nine inch Nails,
I don't want to listen but it's all too clear The Becoming)

Marc Hoeppner

7/19/2007 11:16:00 AM

0

Thanks a lot, dont know how I missed that in the string chapter.

Anyhow, another thing came up:

while string[1..10] is pretty much what I was looking for - is there any
way that I can substitute the numbers (or the whole content of the
square brackets for that matter) with variables?

As it is now I have a file that contains coordinates and a second file
that contains the string that I want to extract from.

So ideally the script would read each line of the coordinate file

45..78
90..120
etc

and uses it in the extraction method

file.readlines each do |l|
puts string[l]
end

Doesnt work tho -any suggestions on how to pipe each line of the
coordinate file to the string method? I know I know, probably simple,
but I am still learning ;)

Cheers,

Marc

--
Posted via http://www.ruby-....

Thomas Worm

7/19/2007 11:54:00 AM

0

On Thu, 19 Jul 2007 20:16:12 +0900, Marc Hoeppner wrote:

> As it is now I have a file that contains coordinates and a second file
> that contains the string that I want to extract from.
>
> So ideally the script would read each line of the coordinate file
>
> 45..78
> 90..120
> etc

Those ..-things are called ranges, which, what wonder, are a class in
ruby. Have a look at http://corelib.rubyon... for the class Range.

another way to express str[45..78] is str[45,78] or str.slice(45,78) or
str.slice(45..78), where the numbers can be replaced by variables:
str[fr..to], str[fr,to], str.slice[fr,to], str.slice(fr..to)

This information can be found at the same webpage, just look for the
class String ;-)


> and uses it in the extraction method
>
> file.readlines each do |l|
> puts string[l]
> end
>
> Doesnt work tho -any suggestions on how to pipe each line of the
> coordinate file to the string method? I know I know, probably simple,
> but I am still learning ;)

l is a String-object, not a Range-object.

file.readlines each do |l|
fr, to = l.split(/\.\./)
puts string[fr,to]
end

should do the job.

The thingy with the slashes in the split-method is a regular expression.

Regards
Thomas

Thomas Worm

7/19/2007 11:57:00 AM

0

On Thu, 19 Jul 2007 11:54:29 +0000, Thomas Worm wrote:

> puts string[fr,to]

should be

puts string[fr.to_i,to.to_i]

Thomas

F. Senault

7/19/2007 12:04:00 PM

0

Le 19 juillet à 13:54, Thomas Worm a écrit :

> Those ..-things are called ranges, which, what wonder, are a class in
> ruby. Have a look at http://corelib.rubyon... for the class Range.
>
> another way to express str[45..78] is str[45,78]

Nope :

>> str[45..78].length
=> 34
>> str[45,78].length
=> 78

(IOW start_position..end_position versus start_position,length.)

Fred
--
I feel it move across my skin. I'm reaching up and reaching out, I'm
reaching for the random or what ever will bewilder me. And following
our will and wind we may just go where no one's been. We'll ride the
spiral to the end and may just go where no one's been. (Tool, Lateralus)

F. Senault

7/19/2007 12:12:00 PM

0

Le 19 juillet à 13:16, Marc Hoeppner a écrit :

> and uses it in the extraction method
>
> file.readlines each do |l|
> puts string[l]
> end

The others solutions in the thread are the ones to use, but I feel the
need to suggest the very dirty / insecure / bad one :

File(filepath).readlines.each do |l|
puts string[eval(l)]
end

Don't try this at home, etc... :)

(But, in a controlled environment, it may be useful since it allows for
all the variations that can be evaluated in one line of ruby code...)

Fred
--
I don't need no arms around me I don't need no drugs to calm me
I have seen the writing on the wall Don't think I need anything at all
No, don't think I'll need anything at all
(Pink Floyd, Another Brick in The Wall part 3)

Thomas Worm

7/19/2007 12:15:00 PM

0

On Thu, 19 Jul 2007 14:04:13 +0200, F. Senault wrote:

> Le 19 juillet à 13:54, Thomas Worm a écrit :
>
>> Those ..-things are called ranges, which, what wonder, are a class in
>> ruby. Have a look at http://corelib.rubyon... for the class
>> Range.
>>
>> another way to express str[45..78] is str[45,78]
>
> Nope :
>
>>> str[45..78].length
> => 34
>>> str[45,78].length
> => 78
>
> (IOW start_position..end_position versus start_position,length.)
>

I guess you are right. I misintepreted the documentation, which says in a
number of examples:

a = "hello there"
a[1,3] #=> "ell"
a[1..3] #=> "ell"

I should have taken the time to read the text instead.

Thomas

Robert Klemme

7/19/2007 12:31:00 PM

0

2007/7/19, F. Senault <fred@lacave.net>:
> Le 19 juillet à 13:16, Marc Hoeppner a écrit :
>
> > and uses it in the extraction method
> >
> > file.readlines each do |l|
> > puts string[l]
> > end
>
> The others solutions in the thread are the ones to use, but I feel the
> need to suggest the very dirty / insecure / bad one :
>
> File(filepath).readlines.each do |l|
> puts string[eval(l)]
> end
>
> Don't try this at home, etc... :)
>
> (But, in a controlled environment, it may be useful since it allows for
> all the variations that can be evaluated in one line of ruby code...)

A safer variant:

file.each do |line|
if /^(\d+)\.\.(\d+)$/ =~ line
puts string[ $1.to_i .. $2.to_i ]
end
end

Note, that file.each is more efficient than file.readlines.each
because it does not need to read the whole file into memory.

Kind regards

robert