[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

concatenate fasta file

Matt

2/12/2010 4:07:00 PM

Hi All, I have a simple problem that I hope somebody can help with. I
have an input file (a fasta file) that I need to edit..

Input file format

>name 1
tactcatacatac
>name 2
acggtggcat
>name 3
gggtaccacgtt

I need to concatenate the sequences.. make them look like

>concatenated
tactcatacatacacggtggcatgggtaccacgtt

thanks. Matt
3 Answers

Roy Smith

2/12/2010 4:24:00 PM

0

In article
<62a50def-e391-4585-9a23-fb91f2e2edc8@b9g2000pri.googlegroups.com>,
PeroMHC <macmanes@gmail.com> wrote:

> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
> >name 1
> tactcatacatac
> >name 2
> acggtggcat
> >name 3
> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
> >concatenated
> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt

Some quick ideas. First, try something along the lines of (not tested):

data=[]
for line in sys.stdin:
if line.startswith('>'):
continue
data.append(line.strip())
print ''.join(data)

Second, check out http://biopython.org/wiki.... I'm sure somebody
has solved this problem before.

Jean-Michel Pichavant

2/12/2010 4:49:00 PM

0

PeroMHC wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>
>> name 1
>>
> tactcatacatac
>
>> name 2
>>
> acggtggcat
>
>> name 3
>>
> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>
>> concatenated
>>
> tactcatacatacacggtggcatgggtaccacgtt
>
> thanks. Matt
>
A solution using regexp:

found = []
for line in open('seqfile.txt'):
found += re.findall('^[acgtACGT]+$', line)

print found
> ['tactcatacatac', 'acggtggcat', 'gggtaccacgtt']

print ''.join(found)
> 'tactcatacatacacggtggcatgggtaccacgtt'


JM

Dave \Crash\ Dummy

2/13/2010 3:15:00 PM

0

On 2010-02-12, PeroMHC <macmanes@gmail.com> wrote:
> Hi All, I have a simple problem that I hope somebody can help with. I
> have an input file (a fasta file) that I need to edit..
>
> Input file format
>
>>name 1
> tactcatacatac
>>name 2
> acggtggcat
>>name 3
> gggtaccacgtt
>
> I need to concatenate the sequences.. make them look like
>
>>concatenated
> tactcatacatacacggtggcatgggtaccacgtt

(echo "concantenated>"; grep '^ [actg]*$' inputfile | tr -d '\n'; echo) > outputfile

--
Grant