[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Text parser / reformatting

Marc Hoeppner

7/9/2007 7:43:00 AM

Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner
_.

Cheers,

Marc

--
Posted via http://www.ruby-....

3 Answers

Robert Klemme

7/9/2007 8:03:00 AM

0

2007/7/9, Marc Hoeppner <marc.hoeppner@molbio.su.se>:
> Hi everyone,
>
> I expect this is a rather trivial problem, but I just started using ruby
> and am a bit stuck right now.
> Here is what I want to do:
>
> I have a text file, that contains information in the following format:
>
> KOG0003
> At2g36170
> At3g52590
> CE15495
> 7295730
> KOG0004
> Hs20476120
> YIL148w
> YKR094c
> SPAC11G7.04
>
> Now, this has to go into a relational database. But right now this is
> not really a table. The desired output would look something like this:
>
> KOG0003 At2g36170
> KOG0003 At3g52590
> KOG0003 CE15495
> KOG0003 7295730
> KOG0004 Hs20476120
> KOG0004 YIL148w
> KOG0004 YKR094c
>
> Well, you get the picture. What I tried to do is to read the text file,
> than look for lines that start with a blank and replace that blank with
> the first word of the previous line, given that this line does in fact
> starts with a word (could also be selected by using KOG[0-9]*). I
> thought of storing the KOG[0-9] in a variable, but overall I cant make
> it work and have no real idea how to solve this. Any help would be
> greatly appreciated. Guess for an experienced user this is a three-liner

Hm... Maybe something like this:

key = nil
ARGF.each do |line|
line.chomp!
case line
when /^(\S+)/
key = line.strip
when /^\s+(\S+)/
print key, " ", $1, "\n" if key
else
# ignore
end
end

Kind regards

robert

Alex Gutteridge

7/9/2007 8:04:00 AM

0

On 9 Jul 2007, at 16:42, Marc Hoeppner wrote:

> Hi everyone,
>
> I expect this is a rather trivial problem, but I just started using
> ruby
> and am a bit stuck right now.
> Here is what I want to do:
>
> I have a text file, that contains information in the following format:
>
> KOG0003
> At2g36170
> At3g52590
> CE15495
> 7295730
> KOG0004
> Hs20476120
> YIL148w
> YKR094c
> SPAC11G7.04
>
> Now, this has to go into a relational database. But right now this is
> not really a table. The desired output would look something like this:
>
> KOG0003 At2g36170
> KOG0003 At3g52590
> KOG0003 CE15495
> KOG0003 7295730
> KOG0004 Hs20476120
> KOG0004 YIL148w
> KOG0004 YKR094c
>
> Well, you get the picture. What I tried to do is to read the text
> file,
> than look for lines that start with a blank and replace that blank
> with
> the first word of the previous line, given that this line does in fact
> starts with a word (could also be selected by using KOG[0-9]*). I
> thought of storing the KOG[0-9] in a variable, but overall I cant make
> it work and have no real idea how to solve this. Any help would be
> greatly appreciated. Guess for an experienced user this is a three-
> liner
> ._.
>
> Cheers,
>
> Marc
>
> --
> Posted via http://www.ruby-....
>

Not a very fancy solution, but it seems to work for the data you
posted. Also uses the pattern you suggested, storing the KOG*
identifier in a variable (field1):

[alexg@powerbook]/Users/alexg/Desktop(7): cat test.rb
field1 = nil
IO.foreach(ARGV[0]) do |l|
if l.match(/^(\S+)/)
field1 = $1
else
puts "#{field1} #{l.strip}"
end
end
[alexg@powerbook]/Users/alexg/Desktop(8): cat data.dat
KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04
[alexg@powerbook]/Users/alexg/Desktop(9): ruby test.rb data.dat
KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c
KOG0004 SPAC11G7.04

Alex Gutteridge

Bioinformatics Center
Kyoto University



Marc Hoeppner

7/9/2007 8:11:00 AM

0

Thanks you two, worked like a charm!


Cheers,

Marc

--
Posted via http://www.ruby-....