Asp Forum - more search and replace

ishamid

12/2/2006 6:48:00 PM

[Total novice]

A follow-up on my last email ("search and replace")". I am trying to
convert an OOo xml source (content.xml) to TeX. It's a bibliography and
thus very predictable/regular/simple etc. Each entry looks roughly like
this (simplified):

====================================
<text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3"

text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text:p>
<text:p text:style-name="Standard">Ben</text:p>
<text:p text:style-name="reference">
<text:span text:style-name="T10">Article</text:span>.,
<text:span text:style-name="Style2">Journal</text:span>,
volume, issue, year.
</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
====================================

I. line one is discussed in my last email. Basically, each line of this
type (numbers are variable) needs to be converted to

====
\head
====

II.
====================================
<text:p text:style-name="P6">Jim</text:p>
<text:p text:style-name="P8">Michael</text:p>
<text:p text:style-name="Standard">Ben</text:p>
====================================

replace each with the name plus a linespace

====================================
Jim

Michael

Ben
====================================

III. <text:span text:style-name="T10">Article</text:span>

If the style-name="T10", then the argument should be, e.g. {\bf
Article}
if the style-name="Style2", then argument should be, e.g. {\it
Journal}

IV. So the final output should be something like

====================================
\head Ben

{\bf Article}, {\it Journal}, volume, issue, year.

====================================

I hope to get enough info here to be able to finish this myself. I
assume finishing my script would only take one of you guys 15 or 20
minutes ;-) If I'm not able to get things working quickly (trying to
learn Ruby and do my work at the same time) I will be happy to pay one
of you for an hour or so of work (I'm up against a deadline).

THANK YOU
Idris

PS For reference, here is the script I'm trying to modify for this OOo
bibliography:

=====================================
class OpenOffice

# using an xml parser if overkill and we need to regexp anyway

attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate

def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end

def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end

def save(filename='')
if filename.empty? then
filename = "clean-#{@file}"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end

def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(//) do
# remove
@data.gsub!(//) do
# remove
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\starttext' + "\n" + $2 + "\n" + '\stoptext'
end

@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end

@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text:p[^>]*?\/>/) do
# remove
end

@data.gsub!(/<text:p.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:p>/)
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text) +
(display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end

def clean_display(str)
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
str
end

def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end

end

def convert(filename)

doc = OpenOffice.new

doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\startparagraph'+"\n","\n"+'\stopparagraph']
doc.display['P3'] = doc.display['P2']

doc.inline['T1'] = ['','']
doc.inline['T2'] = ['{\sl ','}']

doc.translate['¬'] = 'XX'
doc.translate['''] = '`'

doc.load(filename)

doc.convert

doc.save
end

filename = ARGV[0]

filename = 'content.xml' if not filename or filename.empty?

convert('content.xml')
=====================================

3 Answers

Jeremy McAnally

12/2/2006 6:58:00 PM

Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...

It's not Ruby, but it should work (unless you're using this with some
sort of automated system). :)

--Jeremy

On 12/2/06, ishamid <ishamid@colostate.edu> wrote:
> [Total novice]
>
> A follow-up on my last email ("search and replace")". I am trying to
> convert an OOo xml source (content.xml) to TeX. It's a bibliography and
> thus very predictable/regular/simple etc. Each entry looks roughly like
> this (simplified):
>
> ====================================
> <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3"
>
> text:name="AutoNr" text:formula="ooow:AutoNr+1"
> style:num-format="1">4</text:sequence></text:p>
> <text:p text:style-name="Standard">Ben</text:p>
> <text:p text:style-name="reference">
> <text:span text:style-name="T10">Article</text:span>.,
> <text:span text:style-name="Style2">Journal</text:span>,
> volume, issue, year.
> </text:p>
> <text:p text:style-name="reference"/>
> <text:p text:style-name="reference"/>
> ====================================
>
> I. line one is discussed in my last email. Basically, each line of this
> type (numbers are variable) needs to be converted to
>
> ====
> \head
> ====
>
> II.
> ====================================
> <text:p text:style-name="P6">Jim</text:p>
> <text:p text:style-name="P8">Michael</text:p>
> <text:p text:style-name="Standard">Ben</text:p>
> ====================================
>
> replace each with the name plus a linespace
>
> ====================================
> Jim
>
> Michael
>
> Ben
> ====================================
>
> III. <text:span text:style-name="T10">Article</text:span>
>
> If the style-name="T10", then the argument should be, e.g. {\bf
> Article}
> if the style-name="Style2", then argument should be, e.g. {\it
> Journal}
>
> IV. So the final output should be something like
>
> ====================================
> \head Ben
>
> {\bf Article}, {\it Journal}, volume, issue, year.
>
> ====================================
>
> I hope to get enough info here to be able to finish this myself. I
> assume finishing my script would only take one of you guys 15 or 20
> minutes ;-) If I'm not able to get things working quickly (trying to
> learn Ruby and do my work at the same time) I will be happy to pay one
> of you for an hour or so of work (I'm up against a deadline).
>
> THANK YOU
> Idris
>
> PS For reference, here is the script I'm trying to modify for this OOo
> bibliography:
>
> =====================================
> class OpenOffice
>
> # using an xml parser if overkill and we need to regexp anyway
>
> attr_reader :display, :inline, :translate
> attr_writer :display, :inline, :translate
>
> def initialize
> @data = nil
> @file = ''
> @display = Hash.new
> @inline = Hash.new
> @translate = Hash.new
> end
>
> def load(filename)
> if not filename.empty? and FileTest.file?(filename) then
> begin
> @data, @file = IO.read(filename), filename
> rescue
> @data, @file = nil, ''
> end
> else
> @data, @file = nil, ''
> end
> end
>
> def save(filename='')
> if filename.empty? then
> filename = "clean-#{@file}"
> end
> if f = open(filename,'w') then
> f.puts(@data)
> f.close
> end
> end
>
> def convert
> @translations = Hash.new
> @translate.each do |k,v|
> @translations[/#{k}/] = v
> end
> if @data then
> @data.gsub!(/<\?.*?\?>/) do
> # remove
> end
> @data.gsub!(//) do
> # remove
> @data.gsub!(//) do
> # remove
> end
> @data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
> '\starttext' + "\n" + $2 + "\n" + '\stoptext'
> end
>
> @data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
> do
> # remove
> end
>
> @data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
> do
> tag, text = $2, $3
> if inline[tag] then
> (inline[tag][0]||'') + clean_display(text) +
> (inline[tag][1]||'')
> else
> clean_display(text)
> end
> end
> @data.gsub!(/<text:p[^>]*?\/>/) do
> # remove
> end
>
> @data.gsub!(/<text:p.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:p>/)
> do
> tag, text = $2, $3
> if display[tag] then
> "\n" + (display[tag][0]||'') + clean_inline(text) +
> (display[tag][1]||'') + "\n"
> else
> "\n" + clean_inline(text) + "\n"
> end
> end
> @data.gsub!(/\t/,' ')
> @data.gsub!(/^ +$/,'')
> @data.gsub!(/\n\n+/moi,"\n\n")
> end
> end
>
> def clean_display(str)
> str.gsub!(/"(.*?)"/) do
> '\quotation {' + $1 + '}'
> end
> str
> end
>
> def clean_inline(str)
> @translations.each do |k,v|
> str.gsub!(k,v)
> end
> str
> end
>
> end
>
> def convert(filename)
>
> doc = OpenOffice.new
>
> doc.display['P1'] = ['\chapter{','}']
> doc.display['P2'] = ['\startparagraph'+"\n","\n"+'\stopparagraph']
> doc.display['P3'] = doc.display['P2']
>
> doc.inline['T1'] = ['','']
> doc.inline['T2'] = ['{\sl ','}']
>
> doc.translate['¬'] = 'XX'
> doc.translate['''] = '`'
>
> doc.load(filename)
>
> doc.convert
>
> doc.save
> end
>
> filename = ARGV[0]
>
> filename = 'content.xml' if not filename or filename.empty?
>
> convert('content.xml')
> =====================================
>
>
>

ishamid

12/2/2006 7:22:00 PM

Hi Jeremy,

On Dec 2, 11:57 am, "Jeremy McAnally" <jeremymcana...@gmail.com>
wrote:
> Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...

Wow, I did not know this, but...

> It's not Ruby, but it should work (unless you're using this with some
> sort of automated system). :)

I use ConTeXt, not LaTeX, and the two are really different, so...

I am sending a note to the ConTeXt developers list about this; maybe
some of them can port the OOo LaTeX filters to ConTeXt. In the meantime
I think it's best to finish that script...

Thank you very much for letting me know about OOo and LaTeX!

Best
Idris

ishamid

12/2/2006 7:48:00 PM

On Dec 2, 12:21 pm, "ishamid" <isha...@colostate.edu> wrote:
> Hi Jeremy,
>
> On Dec 2, 11:57 am, "Jeremy McAnally" <jeremymcana...@gmail.com>
> wrote:
>
> > Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...Wow, I did not know this, but...
>
> > It's not Ruby, but it should work (unless you're using this with some
> > sort of automated system). :)

I checked it out; the source is way too messy for my purposes; it will
be much easier to convert the xml to ConTeXt than the LaTeX to ConTeXt.

Thnx again
Idris

comp.lang.ruby

more search and replace

ishamid

Jeremy McAnally

ishamid

ishamid

x Login to ForumsZone