egrasso
7/15/2008 3:18:00 AM
Mmmmm... nop. I think I didn't explain the idea very well... I'm writing a
script to find specific secuences of DNA (binding sites) inside of a large
secuence of DNA (for thosse who doesn't know, DNA sequences are made of 4
diferent bases: A, T, C and G). The problem is that the binding sites don't
need to be 100% exact to work. For example, the binding site for an X
protein is "AAATTT", but the protein can also bind to the secuence "AAAGTT"
or "AACGTT" and work fine. I need to find all this sites, but the only data
I have is that "Protein X binds to AAATTT".
I finally solve the problem without using str.index nor regexp, basically,
I seek it manually:
(Note: variables are in spanish!: buscarBS=find binding site,
patron=pattern, semejanza=1 to 0, minimal similarity, cadena=string,
respuesta=answer, largo=length)
def buscarBS(patron, semejanza=0.6, cadena=@secuencia)
respuesta = ""
i = 0.0
j = 0.0
largoc = cadena.length
largop = patron.length
while i <= (largoc-largop)
j = 0.0
puntos = 0.0
subpuntos = largop * (1-semejanza)
while (j < largop) and (subpuntos > 0)
pos = i + j
if cadena[pos] == patron[j] then
puntos +=1
else
subpuntos -=1
end
j+=1
end
if (puntos / largop) >= semejanza then
respuesta = respuesta + "desde: "+(i+1).to_i.to_s+" hasta:
"+(i+j).to_i.to_s+" - similitud: - "+(puntos / largop * 100).to_s+"%\n"
end
i+=1
end
if respuesta == "" then
respuesta = "No se encontro ninguna secuencia similar (similitud:
#{semejanza} - #{patron})"
else
respuesta = "\nSe encontraron las siguientes similitudes:\n\n"+respuesta
end
return respuesta
end
I still need to polish and optimize the code but it find all possible
sites with at least an specific similarity and tells me how similar they
are. If anyone have another idea, need more details about the code or is
interested in bioinformatic with ruby tell me.
Thanks
On Mon, 14 Jul 2008 23:15:30 +0900, phlip <phlip2005@gmail.com> wrote:
>> The problem is that now I need to
>> find all positions where the pattern matches 12 or more chars.
>> For example: For the pattern "aaaaaa", find substrings "aaaaaa",
>> "aaabaa", "baaaaa", "ababaa", etc
>>
>> First I thought that I could create all possible patterns (with \w)
>
> \w{12,}
>
> Right?
>
> Either that or \w{12}\w*