[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Parsing challenge...

Artco News

10/7/2003 6:50:00 PM

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks


2 Answers

Justin Koivisto

10/7/2003 7:32:00 PM

0

Artco News wrote:
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?
>
> Perhaps there are an advanced scripting language can do this easily.

Regex is your friend...

<?php
$fp=fopen('data.txt','r');
$content=fread($fp,filesize('data.txt'));
fclose($fp);
$tmp=time();
$content= preg_replace('/(\r\n|\r|\n)/',$tmp,$content);
$pattern='/NN-111\^(.*)\^/U';
preg_match($pattern,$content,$matches);
$data=explode($tmp,$matches[1]);
unset($matches);
unset($content);
unset($time);
echo '<pre>';
print_r($data);
echo'</pre>';
?>

This will get you an array with each line of data as a separate element.
You should be able to see how to extract the notes and such from the
example. I may be wrong, but it looks like the caret (^) is used as a
field delimiter as well as the newline.

--
Justin Koivisto - spam@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

Ed Morton

10/7/2003 8:04:00 PM

0



Artco News wrote:
> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first
> column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access?
>
> Perhaps there are an advanced scripting language can do this easily.
>
> Thanks
>

This will parse them to make the records/fields obvious:

gawk 'BEGIN{pat="NN-"; RS="\n" pat; FS="^"}
{
printf("Record %d = {\n",NR)
$1 = pat $1
for (i = 1; i <= NF; i++ ) {
printf("\tField %d = { %s }\n",i,$i)
}
printf("}\n")
}' inputfile

It'd be trivial to modify the output to whatever format your database
expects. I used NN- on the start of a line as the record separator,
hence the unique handling of the first field to replace that NN-. When
run on your sample input file, this produces:

Record 1 = {
Field 1 = { NN-CODE#1 }
Field 2 = { DESCRIPTION }
Field 3 = { CODE#2 }
Field 4 = { NOTES }
}
Record 2 = {
Field 1 = { NN-110 }
Field 2 = { an info of NN-001 }
Field 3 = { BRY234 }
Field 4 = { some notes }
}
Record 3 = {
Field 1 = { NN-111 }
Field 2 = { 1st line data
2nd line data
3rd line data }
Field 3 = { BRT345 }
Field 4 = { another notes }
}
Record 4 = {
Field 1 = { NN-112 }
Field 2 = { description of NN-112 }
Field 3 = { BBC23 }
Field 4 = { multiline
notes blah
blah
blah }
}
Record 5 = {
Field 1 = { NN-113 }
Field 2 = { info info }
Field 3 = { MNO12 }
Field 4 = { some notes here
}
}

Regards,

Ed.