Richard Heathfield
12/22/2014 6:43:00 PM
Victor Porton wrote:
> I am writing a program (in fact a Perl script), which should do
> replacement of strings in a file, like this:
>
> AB->BC
> BC->AB
> YY->ZZ
>
> Note that Sed and relatives fail to do this task: It would replace BC back
> with AB after replacing AB with BC, and this is not what I need.
>
> What command line syntax to specify strings to be replaced and replacement
> strings for my command?
>
> I need an advice about the syntax of command line.
The idea sounds simple enough (although I can see some difficulties in
writing the code in a robust and predictable way), but I want to be sure
that I understand you. If I have it right, the user identifies a number of
pairs of strings - to take your example, the pairs would be { AB, BC }, {
BC, AB }, and { YY, ZZ } - and the program then does the following:
1) for each pair:
identify all instances of the left member, and note its start and end
position.
This loop is done *first*.
2) for each pair:
for each instance:
replace the left member with the right member.
So, for example, if we have the string
ABNNABNNBCNNZZNNYY
and the pairs { AB, BC }, { BC, AB }, { YY, ZZ }
we proceed as follows:
Phase 1: split
AB
NN
AB
NN
BC
NNZZNN
YY
Phase 2: replace
BC
NN
BC
NN
AB
NNZZNN
ZZ
Phase 3 (possibly incorporated into Phase 2): splice
BCNNBCNNABNNZZNNZZ
The difficulty comes when you have overlapping search strings. You need to
be very careful how you handle (and document) them.
Anyway, assuming that's the general idea, the command line syntax could be
something like:
programname -<options> -pairs p1a p1b [p2a p2b [p3a p3b [...]]] filespec
Options would be, for example, case-insensitivity (-i) and the like. Then
would come the pairs - the switch '-pairs' might be considered surplus to
requirements, so feel free to ditch it - and finally there is the file
specification (if you're on Unix, you can just let globbing happen).
Alternatively, if there are many pairs, you could put them in a tab-
separated file, one pair per line, and specify the filename - e.g.
programname -i -f=swappairs.tsv j*.txt
--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within