[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Parsing challenge...

Artco News

10/7/2003 6:48:00 PM

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks


7 Answers

ahoward

10/7/2003 10:23:00 PM

0

Paulus Magnus

10/7/2003 11:52:00 PM

0


"Artco News" <artconews@verizon.net> wrote in message
news:CTDgb.31522$yU5.13084@nwrdny01.gnilink.net...
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?

<?
//Assuming we use file to read the file we'll get each line in an array, so
we'll use $testdata as our sample file
$testdata = array ();
$testdata[] = "CODE#1^DESCRIPTION^CODE#2^NOTES\r\n";
$testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
$testdata[] = "NN-111^1st line data\r\n";
$testdata[] = "2nd line data\r\n";
$testdata[] = "3rd line data^BRT345^another notes\r\n";
$testdata[] = "NN-112^description of NN-112^BBC23^multiline\r\n";
$testdata[] = "notes blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

$dbdata = array ();
$row = "";
$cnt = 0;
foreach ($testdata as $line) {
$delimiters = preg_match_all ("/\^/", $line, $waste);
if (($cnt + $delimiters) > 3) {
$dbdata[] = $row;
$cnt = $delimiters;
$row = $line;
} else {
$row .= $line;
$cnt += $delimiters;
}
}
$dbdata[] = $row;
print_r ($dbdata);
?>

.... produces ...

Array (
[0] => CODE#1^DESCRIPTION^CODE#2^NOTES
[1] => NN-110^an info of NN-001^BRY234^some notes
[2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^another
notes
[3] => NN-112^description of NN-112^BBC23^multiline notes blah blah blah
[4] => NN-113^info info^MNO12^some notes here
)

You can then easily iterate through this array, exploding each line by the ^
and creating the INSERT INTO table VALUES (); bits of SQL.

Paulus


Useko Netsumi

10/8/2003 5:49:00 AM

0

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here


"Ara.T.Howard" <ahoward@fsl.noaa.gov> wrote in message
news:Pine.LNX.4.53.0310072218560.32521@eli.fsl.noaa.gov...
> On Tue, 7 Oct 2003, Artco News wrote:
>
> > I thought I ask the scripting guru about the following.
> >
> > I have a file containing records of data with the following format(first
> > column is the label):
> >
> > CODE#1^DESCRIPTION^CODE#2^NOTES
> > NN-110^an info of NN-001^BRY234^some notes
> > NN-111^1st line data
> > 2nd line data
> > 3rd line data^BRT345^another notes
> > NN-112^description of NN-112^BBC23^multiline
> > notes blah
> > blah
> > blah
> > NN-113^info info^MNO12^some notes here
> >
> > How do I parse so I can insert them in the database, e.g. MySQL/Access?
> >
> > Perhaps there are an advanced scripting language can do this easily.
>
> ruby is one of the more advanced :-)
>
> ~/eg/ruby > cat ./parse.rb
>
> #!/usr/bin/env ruby
>
> txt = <<-txt
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
> txt
>
>
> pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
> tuples = txt.scan pat
>
> tuples.map{|tuple| p tuple}
>
>
> ~/eg/ruby > ./parse.rb
>
> [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
> [" NN-110", "an info of NN-001", "BRY234", "some notes"]
> [" NN-111", "1st line data\n 2nd line data\n 3rd line data",
"BRT345", "another notes"]
> [" NN-112", "description of NN-112", "BBC23", "multiline\n notes
blah\n blah\n blah"]
> [" NN-113", "info info", "MNO12", "some notes here"]
>
> -a
> ====================================
> | Ara Howard
> | NOAA Forecast Systems Laboratory
> | Information and Technology Services
> | Data Systems Group
> | R/FST 325 Broadway
> | Boulder, CO 80305-3328
> | Email: ahoward@noaa.gov
> | Phone: 303-497-7238
> | Fax: 303-497-7259
> | The difference between art and science is that science is what we
understand
> | well enough to explain to a computer. Art is everything else.
> | -- Donald Knuth, "Discover"
> | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print
\"\x3a\x2d\x29\x0a\""; done'
> ====================================


Useko Netsumi

10/8/2003 6:11:00 AM

0

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

"Useko Netsumi" <usenets@nyc.rr.com> wrote in message
news:bm08ec$gd9eb$1@ID-159205.news.uni-berlin.de...
> this script failed if any of the cell is blank/no-value,
> e.g:
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^^^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
>
> "Ara.T.Howard" <ahoward@fsl.noaa.gov> wrote in message
> news:Pine.LNX.4.53.0310072218560.32521@eli.fsl.noaa.gov...
> > On Tue, 7 Oct 2003, Artco News wrote:
> >
> > > I thought I ask the scripting guru about the following.
> > >
> > > I have a file containing records of data with the following
format(first
> > > column is the label):
> > >
> > > CODE#1^DESCRIPTION^CODE#2^NOTES
> > > NN-110^an info of NN-001^BRY234^some notes
> > > NN-111^1st line data
> > > 2nd line data
> > > 3rd line data^BRT345^another notes
> > > NN-112^description of NN-112^BBC23^multiline
> > > notes blah
> > > blah
> > > blah
> > > NN-113^info info^MNO12^some notes here
> > >
> > > How do I parse so I can insert them in the database, e.g.
MySQL/Access?
> > >
> > > Perhaps there are an advanced scripting language can do this easily.
> >
> > ruby is one of the more advanced :-)
> >
> > ~/eg/ruby > cat ./parse.rb
> >
> > #!/usr/bin/env ruby
> >
> > txt = <<-txt
> > CODE#1^DESCRIPTION^CODE#2^NOTES
> > NN-110^an info of NN-001^BRY234^some notes
> > NN-111^1st line data
> > 2nd line data
> > 3rd line data^BRT345^another notes
> > NN-112^description of NN-112^BBC23^multiline
> > notes blah
> > blah
> > blah
> > NN-113^info info^MNO12^some notes here
> > txt
> >
> >
> > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
> > tuples = txt.scan pat
> >
> > tuples.map{|tuple| p tuple}
> >
> >
> > ~/eg/ruby > ./parse.rb
> >
> > [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
> > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
> > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",
> "BRT345", "another notes"]
> > [" NN-112", "description of NN-112", "BBC23", "multiline\n notes
> blah\n blah\n blah"]
> > [" NN-113", "info info", "MNO12", "some notes here"]
> >
> > -a
> > ====================================
> > | Ara Howard
> > | NOAA Forecast Systems Laboratory
> > | Information and Technology Services
> > | Data Systems Group
> > | R/FST 325 Broadway
> > | Boulder, CO 80305-3328
> > | Email: ahoward@noaa.gov
> > | Phone: 303-497-7238
> > | Fax: 303-497-7259
> > | The difference between art and science is that science is what we
> understand
> > | well enough to explain to a computer. Art is everything else.
> > | -- Donald Knuth, "Discover"
> > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print
> \"\x3a\x2d\x29\x0a\""; done'
> > ====================================
>
>


Harry Ohlsen

10/8/2003 6:16:00 AM

0

Useko Netsumi wrote:

> this script failed if any of the cell is blank/no-value,
> e.g:

You may be able to simply change each of the "+" (one or more) in the regex into "*" (zero or more). I gave it a quick test and it seems to work OK, but I didn't test very hard.

pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox

Good luck.

Harry O.



Robert Klemme

10/8/2003 8:34:00 AM

0


"Artco News" <artconews@verizon.net> schrieb im Newsbeitrag
news:CTDgb.31522$yU5.13084@nwrdny01.gnilink.net...
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?
>
> Perhaps there are an advanced scripting language can do this easily.

Ruby:

#!/usr/bin/ruby

def process(rec)
while rec.size > 4
dbRec = rec.slice!( 0..3 )
# db insertion here
p dbRec
end
end

rec = []

while ( line = gets )
line.chomp!
rec.concat( line.split('^') )
process rec
end

process rec

ahoward

10/8/2003 2:59:00 PM

0