[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

search and replace

ishamid

12/2/2006 5:44:00 PM

[total novice here]

Hi,

I have a series of expressions like this (shortened from verbose xml)
=====================
[<text:sequence text:ref-name="refAutoNr0">1</text:sequence>
[<text:sequence text:ref-name="refAutoNr1">2</text:sequence>
[<text:sequence text:ref-name="refAutoNr2">3</text:sequence>
[<text:sequence text:ref-name="refAutoNr3">4</text:sequence>
=====================

I want to globally replace each such line with just

====================
\head
====================

followed by a line space so I get

====================
\head

\head

\head

\head
====================

etc.

I am modifying a script with lines like

====================
data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\starttext' + "\n" + $2 + "\n" + '\stoptext'
====================

and don't yet know enough to completely understand. Probably a few more
hours/days of study will get me there but I need this urgently so...

THNX in advance

Best
Idris

6 Answers

Paul Lutus

12/2/2006 5:57:00 PM

0

ishamid wrote:

/ ...

> and don't yet know enough to completely understand. Probably a few more
> hours/days of study will get me there but I need this urgently so...

If you will post a short, complete data example, even just one record as it
appears in your database, so we don't have to try to read between the
lines, someone here will be happy to produce a way to filter the data in
the way you want.

--
Paul Lutus
http://www.ara...

David Vallner

12/2/2006 6:16:00 PM

0

ishamid wrote:
> [total novice here]
>
> Hi,
>
> I have a series of expressions like this (shortened from verbose xml)
> =====================
> [<text:sequence text:ref-name="refAutoNr0">1</text:sequence>
> [<text:sequence text:ref-name="refAutoNr1">2</text:sequence>
> [<text:sequence text:ref-name="refAutoNr2">3</text:sequence>
> [<text:sequence text:ref-name="refAutoNr3">4</text:sequence>
> =====================
>
> I want to globally replace each such line with just
>
> ====================
> \head
> ====================
>
> followed by a line space so I get
>
> ====================
> \head
>
> \head
>
> \head
>
> \head
> ====================
>
> etc.
>
> I am modifying a script with lines like
>
> ====================
> data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
> '\starttext' + "\n" + $2 + "\n" + '\stoptext'
> ====================
>
> and don't yet know enough to completely understand. Probably a few more
> hours/days of study will get me there but I need this urgently so...
>
> THNX in advance
>

Urght. *ducks*

> Best
> Idris
>
>

Regexps and XML always tend to blow up for me. The pattern you're
searching for seems to be a complete element, why not use <insert XML
parser of choice> and XPath?

With REXML, it should be something like:

document.elements.each('//text:sequence') {|sequence|
sequence.replace_with(REXML::Text.new("\\head\n", true))}

Substitute the XPath expression with one of desired precision. I'm a
little unsure around how REXML treats namespaces in XPath and such, but
if you know what prefix will be used in the document, that should work out.

The script might also require a little more massaging if you're
outputting to plaintext, but treating XML like, well, XML might get the
heavy lifting of searching for patterns in it done faster if you use a
pattern language operating on the DOM structure directly.

David Vallner

ishamid

12/2/2006 6:53:00 PM

0

Hi Paul,

On Dec 2, 10:56 am, Paul Lutus wrote:

If you will post a short, complete data example, even just one record
as it
> appears in your database, so we don't have to try to read between the
> lines, someone here will be happy to produce a way to filter the data in
> the way you want.

Ok, here are 4 bibliography entries. I just did a follow-up posting
with more detail (including the full script I'm trying to modify) so
you may prefer to respond to that one. Thank you very much for your
help!.

======================
<text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">1</text:sequence></text:p>
<text:p text:style-name="P6">&apos;Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
text:style-name="T4">., in, </text:span><text:span
text:style-name="T3">&quot;Schätze der Kalifen: Islamische Kunst zur
Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID"><text:span
text:style-name="T5">[</text:span><text:sequence
text:ref-name="refAutoNr1" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">2</text:sequence></text:p>
<text:p text:style-name="P8">&apos;Abd al-Râziq, Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T6">La mosquée al-Azhar</text:span><text:span
text:style-name="T7">., in, </text:span><text:span
text:style-name="T6">&quot;Trésors fatimides du Caire. Exposition
présentée à l&apos;Institut du Monde Arabe ...
</text:span><text:span
text:style-name="T8">1998.&quot;</text:span><text:span
text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr2" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">3</text:sequence></text:p>
<text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
&apos;Abdallah</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr3" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text:p>
<text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name="Style2">Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>

======================

ishamid

12/2/2006 7:02:00 PM

0

Thank you, David, for your pointers. I'm still very much a novice (at
the level of Chris Pine's Learn to Program) so I could not follow them
all, but I do hope to learn more fast. I just sent a follow-up with
more detail, including the script I'm trying to modify; I hope you have
a chance to look at it...

Thank you again
Idris

On Dec 2, 11:15 am, David Vallner <d...@vallner.net> wrote:

> Regexps and XML always tend to blow up for me. The pattern you're
> searching for seems to be a complete element, why not use <insert XML
> parser of choice> and XPath?
>
> With REXML, it should be something like:
>
> document.elements.each('//text:sequence') {|sequence|
> sequence.replace_with(REXML::Text.new("\\head\n", true))}
>
> Substitute the XPath expression with one of desired precision. I'm a
> little unsure around how REXML treats namespaces in XPath and such, but
> if you know what prefix will be used in the document, that should work out.
>
> The script might also require a little more massaging if you're
> outputting to plaintext, but treating XML like, well, XML might get the
> heavy lifting of searching for patterns in it done faster if you use a
> pattern language operating on the DOM structure directly.

Paul Lutus

12/3/2006 1:26:00 AM

0

ishamid wrote:

> Hi Paul,
>
> On Dec 2, 10:56 am, Paul Lutus wrote:
>
> If you will post a short, complete data example, even just one record
> as it
>> appears in your database, so we don't have to try to read between the
>> lines, someone here will be happy to produce a way to filter the data in
>> the way you want.
>
> Ok, here are 4 bibliography entries. I just did a follow-up posting
> with more detail (including the full script I'm trying to modify) so
> you may prefer to respond to that one. Thank you very much for your
> help!.

Okay, thanks for the data example. Now to move forward, could you please
tell us what you want to do with it? Which parts of the data end up in the
output, and in what form?

You earlier said you wanted to process the XML to get a series of

\head

\head
\head
\head


But I think you mean these to be placeholders for the actual data, and I
can't sort out which parts of the XML are meant to end up in the "\head"
elements.

It would help if you could show an example of the data in the XML and its
literal relocation into the desired output format.

Postscript. I copied your posted data example and couldn't parse it, because
there is a mismatch between opening and closing tags -- it's a simple
sanity check I always perform when dealing with XML, and unfortunately the
posted data isn't a complete, internally consistent XML sample. That would
have allowed me to indent/format the XML and get some idea of its overall
structure.

Without an internally consistent XML data block with balanced tags, I can't
parse the XML, and if I can't parse the XML, I can't extract any data from
it in a reliable way.

--
Paul Lutus
http://www.ara...

William James

12/3/2006 4:52:00 AM

0

ishamid wrote:
> Hi Paul,
>
> On Dec 2, 10:56 am, Paul Lutus wrote:
>
> If you will post a short, complete data example, even just one record
> as it
> > appears in your database, so we don't have to try to read between the
> > lines, someone here will be happy to produce a way to filter the data in
> > the way you want.
>
> Ok, here are 4 bibliography entries. I just did a follow-up posting
> with more detail (including the full script I'm trying to modify) so
> you may prefer to respond to that one. Thank you very much for your
> help!.
>
> ======================
> <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0"
> text:name="AutoNr" text:formula="ooow:AutoNr+1"
> style:num-format="1">1</text:sequence></text:p>
> <text:p text:style-name="P6">&apos;Abd al-Râziq, Ahmad</text:p>
> <text:p text:style-name="reference"><text:span
> text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
> text:style-name="T4">., in, </text:span><text:span
> text:style-name="T3">&quot;Schätze der Kalifen: Islamische Kunst zur
> Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
> Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
> Milan: Skira, 1998, pp. 144-147</text:span></text:p>
> <text:p text:style-name="P7"/>
> <text:p text:style-name="P7"/>
> <text:p text:style-name="ID"><text:span
> text:style-name="T5">[</text:span><text:sequence
> text:ref-name="refAutoNr1" text:name="AutoNr"
> text:formula="ooow:AutoNr+1"
> style:num-format="1">2</text:sequence></text:p>
> <text:p text:style-name="P8">&apos;Abd al-Râziq, Ahmad</text:p>
> <text:p text:style-name="reference"><text:span
> text:style-name="T6">La mosquée al-Azhar</text:span><text:span
> text:style-name="T7">., in, </text:span><text:span
> text:style-name="T6">&quot;Trésors fatimides du Caire. Exposition
> présentée à l&apos;Institut du Monde Arabe ...
> </text:span><text:span
> text:style-name="T8">1998.&quot;</text:span><text:span
> text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
> 147-149</text:span></text:p>
> <text:p text:style-name="P7"/>
> <text:p text:style-name="P7"/>
> <text:p text:style-name="ID">[<text:sequence
> text:ref-name="refAutoNr2" text:name="AutoNr"
> text:formula="ooow:AutoNr+1"
> style:num-format="1">3</text:sequence></text:p>
> <text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
> &apos;Abdallah</text:p>
> <text:p text:style-name="reference"><text:span
> text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
> al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
> Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
> Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
> Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
> <text:p text:style-name="reference"/>
> <text:p text:style-name="reference"/>
> <text:p text:style-name="ID">[<text:sequence
> text:ref-name="refAutoNr3" text:name="AutoNr"
> text:formula="ooow:AutoNr+1"
> style:num-format="1">4</text:sequence></text:p>
> <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
> <text:p text:style-name="reference"><text:span
> text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
> Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
> <text:span text:style-name="Style2">Journal of Semitic
> Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
> <text:p text:style-name="reference"/>
> <text:p text:style-name="reference"/>
>
> ======================

puts DATA.read.gsub( %r{<(text:sequence)\s[^>]*>(.*?)</\1>}i,
"\\starttext\n\\2\n\\stoptext" )

--- output -----

<text:p text:style-name="ID">[\starttext
1
\stoptext</text:p>
<text:p text:style-name="P6">&apos;Abd al-R\xC3\xA2ziq,
Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
text:style-name="T4">., in, </text:span><text:span
text:style-name="T3">&quot;Sch\xC3\xA4tze der Kalifen: Islamische Kunst
zur
Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID"><text:span
text:style-name="T5">[</text:span>\starttext
2
\stoptext</text:p>
<text:p text:style-name="P8">&apos;Abd al-R\xC3\xA2ziq,
Ahmad</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T6">La mosqu\xC3(C)e al-Azhar</text:span><text:span
text:style-name="T7">., in, </text:span><text:span
text:style-name="T6">&quot;Tr\xC3(C)sors fatimides du Caire. Exposition
pr\xC3(C)sent\xC3(C)e \xC3 l&apos;Institut du Monde Arabe ...
</text:span><text:span
text:style-name="T8">1998.&quot;</text:span><text:span
text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text:p>
<text:p text:style-name="P7"/>
<text:p text:style-name="P7"/>
<text:p text:style-name="ID">[\starttext
3
\stoptext</text:p>
<text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
&apos;Abdallah</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>
<text:p text:style-name="ID">[\starttext
4
\stoptext</text:p>
<text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
<text:p text:style-name="reference"><text:span
text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name="Style2">Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
<text:p text:style-name="reference"/>
<text:p text:style-name="reference"/>