[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

OOo and regexp

ishamid

12/3/2006 4:58:00 PM

[novice]
Hi,

Paul Lutus suggested that I give more detail about my problem. Ok,
here it is:

BACKGROUND:
Save a small document in OOo format, like a bibliographic entry with,
say, the article title in bold and the journal in italics
- under options - save -> disable xml size optimization
- save the file
- copy the file to a subdirectory
- run "unzip filename"

content.xml has the data we want to convert to TeX. A sample
content.xml is given at the end of this message, after the script.

RUBY: I have a script provided by a colleague that does a lot of the
work needed to convert this to a sane ConTeXt file. I am trying to
teach myself enough ruby to edit this script as needed for academic
articles (I edit an academic journal in TeX). The script is reproduced
at the end of this message.

PROBLEMS: Yesterday I did learn about regexp and made progress, though
the script is still buggy:

i) In the script (l. 110--112) I have

===========
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
===========

but line 114 of content.xml the " pair is not converted, though
it is converted elsewhere.

ii) (really weird) In the script (l. 45--47) I have

============
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + '\head'
end
============

This apparently works fine. Now I want some linespace between
'\startitemize' & '\head', so I put a "\n\n" in between them. This
causes the xml tags to appear in the output file like this

============
<text:p text:style-name="ID">\startitemize

\head</text:p>
============

iii) any tips for improving this script are appreciated. I'm sure I'll
have more questions over the next couple of days as I work on this.

Thank you all in advance for any help or pointers for this novice :-)

Best
Idris

================idris.rb==============
class OpenOffice

# using an xml parser if overkill and we need to regexp anyway

attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate

def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end

def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end

def save(filename='')
if filename.empty? then
filename = "clean-#{@file}.tex"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end

def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + "\n\n" + '\head' # + "\n\n"
end
@data.gsub!(/\[<\/(text:span)><(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\enableregime[utf]' + "\n" + '\useencoding[cyr]' + "\n\n" +
'\definetypeface [russian]' + "\n" + ' ' + '[rm] [serif] [computer-modern] [default]
[encoding=t2a]' + "\n\n" + '\starttext'+ "\n\n" + '\switchtobodyfont[russian]' + "\n" + $2 +
"\n" + '\stopitemize' + "\n\n" + '\stoptext'
end

@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end
# @data.gsub!(/<(text:span
text:style-name="T10")>(.*?)<\/text:span>/mois) do
# '{' + '\bf ' + $2 + '}'
# end
# @data.gsub!(/<(text:span
text:style-name="Style2")>(.*?)<\/text:span>/mois) do
# '{' + '\it ' + $2 + '}'
# end

@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text:span.*?text:style-name=(".*?")>/) do
# remove
end
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(/<!--.*?-->/) do
# remove
end
@data.gsub!(/<text:p[^>]*?\/>/) do
# remove
end

@data.gsub!(/<text:p.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:p>/)
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text)
+ (display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/<text:s[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text:bookmark[^>]*?\/>/) do
# remove
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end

def clean_display(str)
str.gsub!(/&quot;(.*?)&quot;/) do
'\quotation {' + $1 + '}'
end
str.gsub!(/&amp;/) do
'\&'
end
str
end

def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end

end

def convert(filename)

doc = OpenOffice.new

doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\start'+"\n","\n"+'\stop']
doc.display['P3'] = doc.display['P2']
# doc.display['ID'] = ['\relax']

doc.inline['T1'] = ['','']
doc.inline['T2'] = ['','']
doc.inline['T3'] = ['{\bf ','}']
doc.inline['T6'] = ['{\bf ','}']
doc.inline['T8'] = ['{\bf ','}']
doc.inline['T10'] = ['{\bf ','}']
doc.inline['T11'] = ['{\bf ','}']
doc.inline['Style2'] = ['{\it ','}']

# doc.translate['¬'] = 'XX'
doc.translate['&apos;'] = '`'
doc.translate['&amp;'] = '\&'

doc.load(filename)

doc.convert

doc.save
end

filename = ARGV[0]

filename = 'content.xml' if not filename or filename.empty?

convert('content.xml')
===========content.xml============
<?xml version="1.0" encoding="UTF-8"?>

<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:xlink="http://www.w3.org/1999/x...
xmlns:dc="http://purl.org/dc/elements/...
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:math="http://www.w3.org/1998/Math/Ma...
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:ooo="http://openoffice.org/2004/of...
xmlns:ooow="http://openoffice.org/2004/wr...
xmlns:oooc="http://openoffice.org/2004/...
xmlns:dom="http://www.w3.org/2001/xml-ev...
xmlns:xforms="http://www.w3.org/2002/xf...
xmlns:xsd="http://www.w3.org/2001/XMLSc...
xmlns:xsi="http://www.w3.org/2001/XMLSchema-inst...
office:version="1.0">
<office:scripts/>
<office:font-face-decls>
<style:font-face style:name="Wingdings" svg:font-family="Wingdings"
style:font-pitch="variable" style:font-charset="x-symbol"/>
<style:font-face style:name="Symbol" svg:font-family="Symbol"
style:font-family-generic="roman" style:font-pitch="variable"
style:font-charset="x-symbol"/>
<style:font-face style:name="Tahoma2" svg:font-family="Tahoma"/>
<style:font-face style:name="Arial Unicode MS"
svg:font-family="&apos;Arial Unicode MS&apos;"
style:font-pitch="variable"/>
<style:font-face style:name="MS Mincho" svg:font-family="&apos;MS
Mincho&apos;" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma1" svg:font-family="Tahoma"
style:font-pitch="variable"/>
<style:font-face style:name="Garamond" svg:font-family="Garamond"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Times New Roman"
svg:font-family="&apos;Times New Roman&apos;"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Arial" svg:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma" svg:font-family="Tahoma"
style:font-family-generic="swiss" style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph"
style:parent-style-name="Standard"
style:master-page-name="First_20_Page">
<style:paragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
</style:style>
<style:style style:name="P2" style:family="paragraph"
style:parent-style-name="Standard">
<style:paragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
<style:text-properties fo:font-size="14pt" fo:font-weight="bold"
style:font-size-asian="14pt" style:font-weight-asian="bold"
style:font-size-complex="14pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P3" style:family="paragraph"
style:parent-style-name="Standard">
<style:paragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
<style:text-properties fo:font-size="18pt" fo:font-weight="bold"
style:font-size-asian="18pt" style:font-weight-asian="bold"
style:font-size-complex="18pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P4" style:family="paragraph"
style:parent-style-name="Standard"
style:master-page-name="Convert_20_1"/>
<style:style style:name="P5" style:family="paragraph"
style:parent-style-name="Standard"
style:master-page-name="Convert_20_2"/>
<style:style style:name="P6" style:family="paragraph"
style:parent-style-name="Standard">
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P7" style:family="paragraph"
style:parent-style-name="reference">
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="P8" style:family="paragraph"
style:parent-style-name="Standard">
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P9" style:family="paragraph"
style:parent-style-name="Standard">
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P10" style:family="paragraph"
style:parent-style-name="reference">
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="P11" style:family="paragraph"
style:parent-style-name="reference2">
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="T1" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="21pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T2" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="22pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T3" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T4" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="T5" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"/>
</style:style>
<style:style style:name="T6" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
fo:font-weight="bold" style:font-name-asian="Wingdings"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T7" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T8" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-size-complex="10pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T9" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T10" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T11" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T12" style:family="text">
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
<text:sequence-decl text:display-outline-level="0"
text:name="AutoNr"/>
</text:sequence-decls>
<text:p text:style-name="P1"><text:span
text:style-name="T1">Isma</text:span><text:span
text:style-name="T2">&apos;</text:span><text:span
text:style-name="T1">ilis: A Bibliography</text:span></text:p>
<text:p text:style-name="P2"/>
<text:p text:style-name="P2"/>
<text:p text:style-name="P2"/>
<text:p text:style-name="P3">Compiled by:</text:p>
<text:p text:style-name="P3">Ramin Khanbagi</text:p>
<text:p text:style-name="P4"/>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P5"/>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr0" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">1</text:sequence></text:p>
<text:p text:style-name="P6">&apos;Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
text:style-name="T4">., in, </text:span><text:span
text:style-name="T3">&quot;Schätze der Kalifen: Islamische Kunst zur
Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID"><text:span
text:style-name="T5">[</text:span><text:sequence
text:ref-name="refAutoNr1" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">2</text:sequence></text:p>
<text:p text:style-name="P8">&apos;Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T6">La mosquée al-Azhar</text:span><text:span
text:style-name="T7">., in, </text:span><text:span
text:style-name="T6">&quot;Trésors fatimides du Caire. Exposition
présentée à l&apos;Institut du Monde Arabe ...
</text:span><text:span
text:style-name="T8">1998.&quot;</text:span><text:span
text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr2" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">3</text:sequence></text:p>
<text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
&apos;Abdallah</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr3" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text:p>
<text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name="Style2">Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr4" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">5</text:sequence></text:p>
<text:p text:style-name="Standard">Abel, A.</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">De historische betekenis van de Loutere Broeders
van Basra (Bassorah), een wijsgerig gezelschap in de Islam van de Xe
eeuw</text:span>., <text:span text:style-name="Style2">Orientalia
Gandensia</text:span>, 1 (1964), pp. 157-170.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr5" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">6</text:sequence></text:p>
<text:p text:style-name="P9">Abou Said, A.C.</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">Abbasid and Fatimid Political Relations during
the Buhawid Period</text:span><text:span text:style-name="T12">.,
University of Cambridge, 1967.</text:span></text:p>
<text:p text:style-name="reference2">[<text:span
text:style-name="Style2">Dissertation</text:span>]</text:p>
<text:p text:style-name="reference2"/>
<text:p text:style-name="reference2"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr6" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">7</text:sequence></text:p>
<text:p text:style-name="P9">Abu Firas, Shihab al-Din
al-Maynaqi</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">Ash-Shafiya&apos;: An Isma&apos;ili
Treatise</text:span><text:span text:style-name="T12">., Edited and
Translated with an Introduction and Commentary by Sami Nasib Makarim,
Beirut: American University of Beirut, 1966.</text:span></text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr7" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">8</text:sequence></text:p>
<text:p text:style-name="P9">Abu&apos;l-Fida, al-Malik
al-Mu&apos;ayyad &apos;Imad al-Din Ismai&apos;l b. &apos;Ali</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">The Memoirs of a Syrian
Prince</text:span><text:span text:style-name="T12">., Translated by
Peter Malcom Holt, Wiesbaden: Franz Steiner Verlag, [Freiburger
Islamstudien], 1983.</text:span></text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID"><text:bookmark-start
text:name="a01"/>[<text:sequence text:ref-name="refAutoNr8"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">9</text:sequence></text:p>
<text:p text:style-name="P9">Abu-Lughod, J.<text:bookmark-end
text:name="a01"/></text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">Cairo: 1001 Years of the City
Victorious</text:span><text:span text:style-name="T12">., Princeton:
Princeton University Press, 1971. </text:span></text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID"><text:bookmark-start
text:name="a02"/>[<text:sequence text:ref-name="refAutoNr9"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">10</text:sequence></text:p>
<text:p text:style-name="P6">Adamji, Ebrahimji N. and Sorabji M.
Darookhanawala</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T8">Two Indian Travellers: East Africa, 1902-1905:
Being Accounts of Journeys Made by Ebrahimji N. Adamji, a Very Young
Bohra Merchant from Mombasa &amp; Sorabji M. Darookhanawala, a
Middle-Aged Parsi Engineer from Zanzibar</text:span><text:span
text:style-name="T9">., Edited by C. Salvadori and J. Aldrick, Mombasa:
Friends of Fort Jesus, 1997.</text:span></text:p>
<text:p text:style-name="P10"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr1113" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">11</text:sequence></text:p>
<text:p
text:style-name="P9">Каландаров, Тохир
Сафарбекович</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">РелигиознаяситуациянаПамире(кпроблемерелигиозногосинкретизма).
(Summary: The Religious Situation on the Pamirs (to the problem of
religious syncretism).)</text:span><text:span text:style-name="T12">.,
</text:span><text:span
text:style-name="Style2">Восток</text:span><text:span
text:style-name="T12">, 2000 vi, pp. 36-49;219</text:span></text:p>
<text:p text:style-name="reference2">[Ismailis in
Tajikistan.]</text:p>
<text:p text:style-name="reference2"/>
<text:p text:style-name="reference2"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr1114" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">12</text:sequence></text:p>
<text:p
text:style-name="P9">Шохуморов, Саиданвар</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T11">Исмаилизм:
традицииисовременность</text:span><text:span
text:style-name="T12">., </text:span><text:span
text:style-name="Style2">ЦентральнаяАзияиКавказ</text:span><text:span
text:style-name="T12">, 2000ii/8, pp. 128-138</text:span></text:p>
<text:p text:style-name="P11">[Also online at <text:span
text:style-name="T10">www.ca-c.org/journal-table-rus.shtml</text:span>]</text:p>
</office:text>
</office:body>
</office:document-content>

1 Answer

Paul Lutus

12/3/2006 8:14:00 PM

0

ishamid wrote:

> [novice]
> Hi,
>
> Paul Lutus suggested that I give more detail about my problem. Ok,
> here it is:
>
> BACKGROUND:
> Save a small document in OOo format,

Do you mean an Open Office Open Document format? The sort of data file that
typically has a suffix of ".odt" and consists of a compressed set of XML
files for various purposes?

> like a bibliographic entry with,
> say, the article title in bold and the journal in italics
> - under options - save -> disable xml size optimization
> - save the file
> - copy the file to a subdirectory
> - run "unzip filename"

I think this answers my first question.

> content.xml has the data we want to convert to TeX. A sample
> content.xml is given at the end of this message, after the script.
>
> RUBY: I have a script provided by a colleague that does a lot of the
> work needed to convert this to a sane ConTeXt file. I am trying to
> teach myself enough ruby to edit this script as needed for academic
> articles (I edit an academic journal in TeX). The script is reproduced
> at the end of this message.
>
> PROBLEMS: Yesterday I did learn about regexp and made progress, though
> the script is still buggy:
>
> i) In the script (l. 110--112) I have
>
> ===========
> str.gsub!(/&quot;(.*?)&quot;/) do
> '\quotation {' + $1 + '}'
> end
> ===========
>
> but line 114 of content.xml the &quot; pair is not converted, though
> it is converted elsewhere.

I am unable to correlate this line number with a &quot; sequence in the
corresponding line in your provided XML sample. Are the two quote sequences
on separate lines? If so, use this form:

str.gsub!(/&quot;(.*?)&quot;/m) do
'\quotation {' + $1 + '}'
end

Note the added 'm'. This won't work if you are parsing the file line by line
and if the two &quot; sequences are on different XML lines.

If (1) you have two &quot; sequences on different lines, and if (2) you are
processing the XML content line by line, then you will have to change how
you process the file in the most fundamental way to get this particular TeX
conversion to work.

>
> ii) (really weird) In the script (l. 45--47) I have
>
> ============
> @data.gsub!(/\[<(text:sequence
> text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
> '\startitemize' + '\head'
> end
> ============
>
> This apparently works fine. Now I want some linespace between
> '\startitemize' & '\head', so I put a "\n\n" in between them. This
> causes the xml tags to appear in the output file like this
>
> ============
> <text:p text:style-name="ID">\startitemize
>
> \head</text:p>
> ============

Yes. This is what you instructed the computer to do, and apparently the
computer succeeded in meeting your request. I assume this is on the TeX
side of the conversion process, and I don't happen to know how a linefeed
is represented in TeX, but I believe that (a TeX linefeed) is what you want
to insert, not bare linefeeds (unless I have completely misunderstood you).

>
> iii) any tips for improving this script are appreciated. I'm sure I'll
> have more questions over the next couple of days as I work on this.

I had hoped for a list of desired conversions, rather than a script that
needs work. Most people are reluctant to dig into someone else's code, such
an approach normally takes much longer than starting over.

I was able to format your XML this time, because the example was complete,
and having taken a look at it, I assume this is a OpenOffice Open Document
format file, yes?

Postscript. Have you considered all your options? OpenOffice will save its
documents in many formats, several of which preserve the original
formatting. For example, you could save the document as RTF, then use the
utility "rtf2TeX" to perform the conversion to TeX.

I haven't actually done this, but I can see your effort level and I thought
I would alert you to some other options.

I would also have mentioned saving as HTML and using html2tex, but I doubt
you would be pleased with the outcome (no paging or footnotes AFAIK).

Post-postscript. The TeX output is a requirement, yes? There are many
excellent output formats that are in wider use today.

--
Paul Lutus
http://www.ara...