[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

regexp working with mixed lines endings

pere.noel

11/25/2006 7:38:00 AM

Hey all,

i've an audio file (wav) containing some xml metadatas at start or
ending of the ausio datas.

my regexp works fine with unix lines endings.

however some recorder puts mixed line ending where my regexp isn't
working.

is their a special option able to work with all kind of endings ?

my regexps :

rgxstart=Regexp.new("<BWFXML>")
rgxstop=Regexp.new("</BWFXML>")


the comparaison i do :

rgxstart === l.chomp

l being :

File.open(<the sound file>).each { |l| ...}
--
une bévue
5 Answers

Paul Lutus

11/25/2006 8:08:00 AM

0

Une bévue wrote:

/ ...

> is their a special option able to work with all kind of endings ?

Sure. For mixed Windows and Unix/Linux line endings, just delete the
carriage returns:

data.gsub!{/\r/,"")

>
> my regexps :
>
> rgxstart=Regexp.new("<BWFXML>")
> rgxstop=Regexp.new("</BWFXML>")
>
>
> the comparaison i do :
>
> rgxstart === l.chomp
>
> l being :
>
> File.open(<the sound file>).each { |l| ...}

Try this instead:

data = File.read(filename)

data.gsub!(/\r/,"")

array = []

data.split("\n").each do |line|
# process lines here
array << line
end

By using this approach, all your XML lines will be made uniform. At the end
of the processing, you will need to reintegrate the lines into a block for
storage:

data = array.join("\n")

file.open(filename,"w") { |f| f.write data }

--
Paul Lutus
http://www.ara...

pere.noel

11/25/2006 10:11:00 AM

0

Paul Lutus <nospam@nosite.zzz> wrote:

> Sure. For mixed Windows and Unix/Linux line endings, just delete the
> carriage returns:
>
> data.gsub!{/\r/,"")
>
> >
<snip />
> Try this instead:
>
> data = File.read(filename)
>
> data.gsub!(/\r/,"")
>
> array = []
>
> data.split("\n").each do |line|
> # process lines here
> array << line
> end
>
> By using this approach, all your XML lines will be made uniform. At the end
> of the processing, you will need to reintegrate the lines into a block for
> storage:
>
> data = array.join("\n")
>
> file.open(filename,"w") { |f| f.write data }

OK fine thanks very much it's a nice solution somehow "normalizing" win*
line endings ;-)


In fact i've a little bit modified what u've wroten :
data.gsub!(/\r\n/,"\n")
data.gsub!(/\r/,"\n")

because i've discovered in the mean time i could have :
\r
\n
\r\n

lines endings )))

does \n\r exists ? (wikipedia says NO)

also because the most part of the audio input file is "binary" datas
there line ending is out of meaning, i suppose.

anyway, thanks a lot i'm now "armed" to face any situation ;-)

right now with the two first examples files i get doing my wav2xml and
reading the xml file gave me syntax colored results (within two
different text editors), then i think it is a proof the prob is cured !
--
une bévue

Paul Lutus

11/25/2006 11:49:00 AM

0

Une bévue wrote:

> Paul Lutus <nospam@nosite.zzz> wrote:
>
>> Sure. For mixed Windows and Unix/Linux line endings, just delete the
>> carriage returns:
>>
>> data.gsub!{/\r/,"")
>>
>> >
> <snip />
>> Try this instead:
>>
>> data = File.read(filename)
>>
>> data.gsub!(/\r/,"")
>>
>> array = []
>>
>> data.split("\n").each do |line|
>> # process lines here
>> array << line
>> end
>>
>> By using this approach, all your XML lines will be made uniform. At the
>> end of the processing, you will need to reintegrate the lines into a
>> block for storage:
>>
>> data = array.join("\n")
>>
>> file.open(filename,"w") { |f| f.write data }
>
> OK fine thanks very much it's a nice solution somehow "normalizing" win*
> line endings ;-)
>
>
> In fact i've a little bit modified what u've wroten :
> data.gsub!(/\r\n/,"\n")
> data.gsub!(/\r/,"\n")

What's the point? You have the following possibilities:

\r\n

\n\r

\n

All of these cases are handled by my posted method.

>
> because i've discovered in the mean time i could have :
> \r
> \n
> \r\n
>
> lines endings )))

Okay, the first ("\r") might be old-style Macintosh line endings. Here is a
solution for all the possibilities:

data.gsub!(%r{(\r\n|\n\r|\r)},"\n")

>
> does \n\r exists ? (wikipedia says NO)

Doesn't matter. Someone might type it in manually. If it exists, the above
method will handle it.

>
> also because the most part of the audio input file is "binary" datas
> there line ending is out of meaning, i suppose.

What? You are reading binary files? Then don't try to filter line endings.

If the file is text, you can filter line endings. Use the above method.

If the file is not text, do not filter anything.

--
Paul Lutus
http://www.ara...

pere.noel

11/25/2006 12:37:00 PM

0

Paul Lutus <nospam@nosite.zzz> wrote:

>
> Okay, the first ("\r") might be old-style Macintosh line endings. Here is a
> solution for all the possibilities:
>
> data.gsub!(%r{(\r\n|\n\r|\r)},"\n")
>
> >
> > does \n\r exists ? (wikipedia says NO)
>
> Doesn't matter. Someone might type it in manually. If it exists, the above
> method will handle it.

OK, thanks, i'll try that asap.

> >
> > also because the most part of the audio input file is "binary" datas
> > there line ending is out of meaning, i suppose.
>
> What? You are reading binary files? Then don't try to filter line endings.

BUT I DON'T have the choice the audio files i get does have metadatas
writen in xml mixed with binary audio datas. The line endings are
"correct" within the xml. I have to face with the output given by
various recorders.

i've uploaded in <http://thoraval.yvon.free.fr...

a *** truncated *** version of one of the file i'm getting the xml part,
this file is named "bidule-truncated.wav" don't play it as an audio file
because i've writen :

[audio part truncated]

in the middle of the audio part to make it lighter (4k instead of MBs).

anyway thanks a lot helping me for that line endings ;-)

>
> If the file is text, you can filter line endings. Use the above method.
>
> If the file is not text, do not filter anything.

then don't work...
--
une bévue

Paul Lutus

11/25/2006 6:30:00 PM

0

Une bévue wrote:

> Paul Lutus <nospam@nosite.zzz> wrote:
>
>>
>> Okay, the first ("\r") might be old-style Macintosh line endings. Here is
>> a solution for all the possibilities:
>>
>> data.gsub!(%r{(\r\n|\n\r|\r)},"\n")
>>
>> >
>> > does \n\r exists ? (wikipedia says NO)
>>
>> Doesn't matter. Someone might type it in manually. If it exists, the
>> above method will handle it.
>
> OK, thanks, i'll try that asap.
>
>> >
>> > also because the most part of the audio input file is "binary" datas
>> > there line ending is out of meaning, i suppose.
>>
>> What? You are reading binary files? Then don't try to filter line
>> endings.
>
> BUT I DON'T have the choice the audio files i get does have metadatas
> writen in xml mixed with binary audio datas. The line endings are
> "correct" within the xml. I have to face with the output given by
> various recorders.

If you read a file that is part text and part binary, DO NOT filter line
endings. Instead, write your parsing code to accommodate different line
endings on the fly. One way to do this is to read a specific block size
from the file (by detecting a delimiter that separates the text from the
binary parts), work on that block, then reattach the block to the file.

>
> i've uploaded in <http://thoraval.yvon.free.fr...
>
> a *** truncated *** version of one of the file i'm getting the xml part,
> this file is named "bidule-truncated.wav" don't play it as an audio file
> because i've writen :
>
> [audio part truncated]
>
> in the middle of the audio part to make it lighter (4k instead of MBs).
>
> anyway thanks a lot helping me for that line endings ;-)
>
>>
>> If the file is text, you can filter line endings. Use the above method.
>>
>> If the file is not text, do not filter anything.
>
> then don't work...

Treat the text part differently than the binary part. Read the entire file,
split it up based on some kind of delimiters, edit the text part, recombine
the separated parts, save the file.

BTW, how is the binary data mixed with the text data? Is this an XML file
that uses the CDATA blocking convention? That scheme is quite manageable.

--
Paul Lutus
http://www.ara...