[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Newbie: working with a text file and converting to xml

Adam Teale

12/6/2006 7:49:00 AM

hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple's Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

I thought this might be a good starting project to get into Ruby

Any suggestions on how I might approach this?

Thanks!

Adam Teale

--
Posted via http://www.ruby-....

32 Answers

Kev Jackson

12/6/2006 8:06:00 AM

0

> I have a tab-delimited text file that I would like to convert into an
> xml file that can be read/imported into Apple's Final Cut Pro.
>
> The text file is 2 columns.
> The first column is the time (timecode)
> The second column is text (for sub-titling)
>
> I thought this might be a good starting project to get into Ruby
>
> Any suggestions on how I might approach this?

look at XMLBuilder and FasterCSV

Setup FasterCSV to use a tab as the delimiter instead of the comma and
then use it to read the input and then use XMLBuilder to output
<timecode>data</timecode><sub-title>data</subtitle>

should be fairly simple, or you can avoid libraries and do it by
yourself to learn more about ruby without getting bogged down in 3rd
party libs

x = Builder::XmlMarkup.new(:target => $stdout, :indent => 1)
x.instruct
x.timcode data
x.sub-title data

etc

Kev

Peter Szinek

12/6/2006 8:28:00 AM

0

Adam Teale wrote:
> hi Guys,
>
> I have a tab-delimited text file that I would like to convert into an
> xml file that can be read/imported into Apple's Final Cut Pro.
>
> The text file is 2 columns.
> The first column is the time (timecode)
> The second column is text (for sub-titling)

Could you send us 2 example files? I guess the text file format is
obvious (but better to work with a real-life example) but I am not so
sure about the Final Cut Pro XML (or is it just a plain simple XML?)

Until then, check out this code:

============================================================
input = <<INPUT
0.12 Salut, Foo!
0.15 Hola Bar! Did you see Baz?
0.22 I guess he is hanging around with Fluff and Ork.
INPUT

template = <<TEMPLATE
<timecode>TIMECODE</timecode>
<sub-titling>SUB-TITLING</sub-titling>
TEMPLATE

result = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

input.split(/\n/).each do |line|
data = line.split(/\t/)
result += template.sub('TIMECODE'){data[0]}.sub('SUB-TITLING'){data[1]}
end

result += '</xml>'

puts result
============================================================

output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<timecode>0.12</timecode>
<sub-titling>Salut, Foo!</sub-titling>
<timecode>0.15</timecode>
<sub-titling>Hola Bar! Did you see Baz?</sub-titling>
<timecode>0.22</timecode>
<sub-titling>I guess he is hanging around with Fluff and
Ork.</sub-titling>
</xml>


Cheers,
Peter

__
http://www.rubyra...


Adam Teale

12/6/2006 8:52:00 AM

0

Hi Kev & Peter!

Thanks for respoding so quickly!

The text file looks pretty much like that

00:00:30:13 Swayambhunath Temple: building started 460AD
00:00:42:21 Durbar Square
00:01:05:06 Driving to Trisuli River for Rafting
00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)

It'll take a while for your example to filter down into my brain - when
it does I'll get back to you about it.

Awesome!

Thanykou so much!

Adam


Peter Szinek wrote:
> Adam Teale wrote:
>> hi Guys,
>>
>> I have a tab-delimited text file that I would like to convert into an
>> xml file that can be read/imported into Apple's Final Cut Pro.
>>
>> The text file is 2 columns.
>> The first column is the time (timecode)
>> The second column is text (for sub-titling)
>
> Could you send us 2 example files? I guess the text file format is
> obvious (but better to work with a real-life example) but I am not so
> sure about the Final Cut Pro XML (or is it just a plain simple XML?)
>
> Until then, check out this code:
>
> ============================================================
> input = <<INPUT
> 0.12 Salut, Foo!
> 0.15 Hola Bar! Did you see Baz?
> 0.22 I guess he is hanging around with Fluff and Ork.
> INPUT
>
> template = <<TEMPLATE
> <timecode>TIMECODE</timecode>
> <sub-titling>SUB-TITLING</sub-titling>
> TEMPLATE
>
> result = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"
>
> input.split(/\n/).each do |line|
> data = line.split(/\t/)
> result +=
> template.sub('TIMECODE'){data[0]}.sub('SUB-TITLING'){data[1]}
> end
>
> result += '</xml>'
>
> puts result
> ============================================================
>
> output:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <timecode>0.12</timecode>
> <sub-titling>Salut, Foo!</sub-titling>
> <timecode>0.15</timecode>
> <sub-titling>Hola Bar! Did you see Baz?</sub-titling>
> <timecode>0.22</timecode>
> <sub-titling>I guess he is hanging around with Fluff and
> Ork.</sub-titling>
> </xml>
>
>
> Cheers,
> Peter
>
> __
> http://www.rubyra...


--
Posted via http://www.ruby-....

Paul Lutus

12/6/2006 8:53:00 AM

0

Peter Szinek wrote:

> Adam Teale wrote:
>> hi Guys,
>>
>> I have a tab-delimited text file that I would like to convert into an
>> xml file that can be read/imported into Apple's Final Cut Pro.
>>
>> The text file is 2 columns.
>> The first column is the time (timecode)
>> The second column is text (for sub-titling)
>
> Could you send us 2 example files? I guess the text file format is
> obvious (but better to work with a real-life example) but I am not so
> sure about the Final Cut Pro XML (or is it just a plain simple XML?)
>
> Until then, check out this code:
>
> ============================================================
> input = <<INPUT
> 0.12 Salut, Foo!
> 0.15 Hola Bar! Did you see Baz?
> 0.22 I guess he is hanging around with Fluff and Ork.
> INPUT
>
> template = <<TEMPLATE
> <timecode>TIMECODE</timecode>
> <sub-titling>SUB-TITLING</sub-titling>
> TEMPLATE
>
> result = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"
>
> input.split(/\n/).each do |line|
> data = line.split(/\t/)
> result +=
> template.sub('TIMECODE'){data[0]}.sub('SUB-TITLING'){data[1]}
> end
>
> result += '</xml>'
>
> puts result
> ============================================================
>
> output:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <timecode>0.12</timecode>
> <sub-titling>Salut, Foo!</sub-titling>
> <timecode>0.15</timecode>
> <sub-titling>Hola Bar! Did you see Baz?</sub-titling>
> <timecode>0.22</timecode>
> <sub-titling>I guess he is hanging around with Fluff and
> Ork.</sub-titling>
> </xml>

I think it would be better XML style to have each timecode/subtitling data
set enclosed in a grouping tag pair, like <item> ... </item>, or some
similar name. This is more or less the XML equivalent of a record
delimiter.

This isn't a requirement in any sense of the word, because the generated XML
is valid, but more a way to recognize the natural grouping of the data.

--
Paul Lutus
http://www.ara...

Paul Lutus

12/6/2006 9:07:00 AM

0

Peter Szinek wrote:

> </xml>

I just noticed this. There is no closing tag for the XML header tag. The XML
header tag is the only exception to a strict rule about tag formatting in
XML (that a tag is either <self-closing/> or has a <closing> ...
</partner>).

--
Paul Lutus
http://www.ara...

Paul Lutus

12/6/2006 9:08:00 AM

0

Adam Teale wrote:

> Hi Kev & Peter!
>
> Thanks for respoding so quickly!
>
> The text file looks pretty much like that
>
> 00:00:30:13 Swayambhunath Temple: building started 460AD
> 00:00:42:21 Durbar Square
> 00:01:05:06 Driving to Trisuli River for Rafting
> 00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
> 00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
> 00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
> 00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
> 00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)
>
> It'll take a while for your example to filter down into my brain - when
> it does I'll get back to you about it.

--------------------------------------------

#!/usr/bin/ruby -w

data =<<__EOL__
00:00:30:13 Swayambhunath Temple: building started 460AD
00:00:42:21 Durbar Square
00:01:05:06 Driving to Trisuli River for Rafting
00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)
__EOL__

output = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

data.each do |line|
timecode,subtitle = line.strip.split("\t")
xml =
"<item><timecode>#{timecode}</timecode><subtitle>#{subtitle}</subtitle></item>"
output += xml + "\n"
end

File.open("output.xml","w") { |f| f.write output }

--------------------------------------------

The data block at the top can easily be replaced with a file reader line:

data = File.read(filename)

--
Paul Lutus
http://www.ara...

Peter Szinek

12/6/2006 9:15:00 AM

0

Adam Teale wrote:
> The text file looks pretty much like that

Then it should be fine - as far as there are no tabs in the second
column. Of course even that would not mean an unsolvable problem but it
would not work with the code I sent you.

> It'll take a while for your example to filter down into my brain - when
> it does I'll get back to you about it.

Sure!

>
> Awesome!
Yeah, Ruby is awesome! I am a beginner, too (picked up Ruby a few months
ago) and though I have very limited time to learn it, I can do a lot of
things already. The learning curve is really steep.

Cheers,
Peter

__
http://www.rubyra...

Adam Teale

12/6/2006 11:37:00 AM

0

Hi Peter,

I saved your code and called it convert.rb. I ran it (replacing
'filename' with the path of my text file - was that right to do?)

i got this error:
convert.rb:1: unknown regexp options - atal

any ideas?

also, do you know if thereis any way to run a script from the
commandline like?:
/convert.rb mytextfile.txt
i made a shell script that used this kind of thing - it took the input
file as something like $ARGV (i think - sorry i'm a super newbie!!)
make sense?

Thanks Peter!

Adam


Peter Szinek wrote:
> Adam Teale wrote:
>> The text file looks pretty much like that
>
> Then it should be fine - as far as there are no tabs in the second
> column. Of course even that would not mean an unsolvable problem but it
> would not work with the code I sent you.
>
>> It'll take a while for your example to filter down into my brain - when
>> it does I'll get back to you about it.
>
> Sure!
>
>>
>> Awesome!
> Yeah, Ruby is awesome! I am a beginner, too (picked up Ruby a few months
> ago) and though I have very limited time to learn it, I can do a lot of
> things already. The learning curve is really steep.
>
> Cheers,
> Peter
>
> __
> http://www.rubyra...


--
Posted via http://www.ruby-....

Peter Szinek

12/6/2006 11:50:00 AM

0

Adam Teale wrote:
> Hi Peter,
>
> I saved your code and called it convert.rb. I ran it (replacing
> 'filename' with the path of my text file - was that right to do?)
>
> i got this error:
> convert.rb:1: unknown regexp options - atal
>
> any ideas?
I guess you are referring to Paul's solution since I did not use any
files :) In any case, could you paste the code here (convert.rb) so I
can check what's going on?

> also, do you know if thereis any way to run a script from the
> commandline like?:
> ./convert.rb mytextfile.txt

Sure. The array called ARGV contains all the command line options.

------ test.rb
#!/usr/bin/ruby
puts ARGV[0]
puts ARGV[1]
------

/test rb foo bar

will output

----
foo
bar
----

Cheers,
Peter

__
http://www.rubyra...

Adam Teale

12/6/2006 12:29:00 PM

0

doh! Sorry guys!

Peter - thanks for the ARGV tips!

I think i have Paul's script going using the ARGV
---------------------------------------------------
#!/usr/bin/ruby -w

data = File.read(ARGV[0])

output = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

data.each do |line|
timecode,subtitle = line.strip.split("\t")
xml =
"<item><timecode>#{timecode}</timecode><subtitle>#{subtitle}</subtitle></item>"
output += xml + "\n"
end

File.open("output.xml","w") { |f| f.write output }
---------------------------------------------------


However it only outputs the first line from my txt file:
---------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<item><timecode>00:00:30:13</timecode><subtitle>Swayambhunath Temple:
building started 460AD
00:00:42:21</subtitle></item>
---------------------------------------------------

Apologies for my newbieness!

Cheers guys!

Adam




--
Posted via http://www.ruby-....