Asp Forum - Suggest a command line syntax

Victor Porton

12/22/2014 6:10:00 PM

I am writing a program (in fact a Perl script), which should do replacement
of strings in a file, like this:

AB->BC
BC->AB
YY->ZZ

Note that Sed and relatives fail to do this task: It would replace BC back
with AB after replacing AB with BC, and this is not what I need.

What command line syntax to specify strings to be replaced and replacement
strings for my command?

I need an advice about the syntax of command line.

--
Victor Porton - http://porton...

6 Answers

Richard Heathfield

12/22/2014 6:43:00 PM

Victor Porton wrote:

> I am writing a program (in fact a Perl script), which should do
> replacement of strings in a file, like this:
>
> AB->BC
> BC->AB
> YY->ZZ
>
> Note that Sed and relatives fail to do this task: It would replace BC back
> with AB after replacing AB with BC, and this is not what I need.
>
> What command line syntax to specify strings to be replaced and replacement
> strings for my command?
>
> I need an advice about the syntax of command line.

The idea sounds simple enough (although I can see some difficulties in
writing the code in a robust and predictable way), but I want to be sure
that I understand you. If I have it right, the user identifies a number of
pairs of strings - to take your example, the pairs would be { AB, BC }, {
BC, AB }, and { YY, ZZ } - and the program then does the following:

1) for each pair:
identify all instances of the left member, and note its start and end
position.

This loop is done *first*.

2) for each pair:
for each instance:
replace the left member with the right member.

So, for example, if we have the string

ABNNABNNBCNNZZNNYY

and the pairs { AB, BC }, { BC, AB }, { YY, ZZ }
we proceed as follows:

Phase 1: split

AB
NN
AB
NN
BC
NNZZNN
YY

Phase 2: replace

BC
NN
BC
NN
AB
NNZZNN
ZZ

Phase 3 (possibly incorporated into Phase 2): splice

BCNNBCNNABNNZZNNZZ

The difficulty comes when you have overlapping search strings. You need to
be very careful how you handle (and document) them.

Anyway, assuming that's the general idea, the command line syntax could be
something like:

programname -<options> -pairs p1a p1b [p2a p2b [p3a p3b [...]]] filespec

Options would be, for example, case-insensitivity (-i) and the like. Then
would come the pairs - the switch '-pairs' might be considered surplus to
requirements, so feel free to ditch it - and finally there is the file
specification (if you're on Unix, you can just let globbing happen).

Alternatively, if there are many pairs, you could put them in a tab-
separated file, one pair per line, and specify the filename - e.g.

programname -i -f=swappairs.tsv j*.txt

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Victor Porton

12/22/2014 6:48:00 PM

Richard Heathfield wrote:

> Anyway, assuming that's the general idea, the command line syntax could be
> something like:
>
>
>
> programname -<options> -pairs p1a p1b [p2a p2b [p3a p3b [...]]] filespec

This way it is easy to confuse <filespec> with pairs.

> Options would be, for example, case-insensitivity (-i) and the like. Then
> would come the pairs - the switch '-pairs' might be considered surplus to
> requirements, so feel free to ditch it - and finally there is the file
> specification (if you're on Unix, you can just let globbing happen).
>
> Alternatively, if there are many pairs, you could put them in a tab-
> separated file, one pair per line, and specify the filename - e.g.
>
> programname -i -f=swappairs.tsv j*.txt

Hm, maybe to use CSV instead? It allows encoding arbitrary strings including
comma and tab.

--
Victor Porton - http://porton...

Richard Heathfield

12/22/2014 7:45:00 PM

Victor Porton wrote:

> Richard Heathfield wrote:
>
>> Anyway, assuming that's the general idea, the command line syntax could
>> be something like:
>>
>>
>>
>> programname -<options> -pairs p1a p1b [p2a p2b [p3a p3b [...]]] filespec
>
> This way it is easy to confuse <filespec> with pairs.

So you put a marker between them, e.g. -f filespec

>
>> Options would be, for example, case-insensitivity (-i) and the like. Then
>> would come the pairs - the switch '-pairs' might be considered surplus to
>> requirements, so feel free to ditch it - and finally there is the file
>> specification (if you're on Unix, you can just let globbing happen).
>>
>> Alternatively, if there are many pairs, you could put them in a tab-
>> separated file, one pair per line, and specify the filename - e.g.
>>
>> programname -i -f=swappairs.tsv j*.txt
>
> Hm, maybe to use CSV instead? It allows encoding arbitrary strings
> including comma and tab.

Sure, but I deliberately chose tab because you're more likely to want to
include a comma in a pair than include a tab in a pair. But whatever floats
your boat.

(If you follow all the advice in this post and my last, you'll end up with
two -fs with different meanings! Choose a different switch for the tsv/csv
file - e.g. -m for meta or -p for pairfile or something like that.)

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Kaz Kylheku

12/22/2014 8:10:00 PM

On 2014-12-22, Victor Porton <porton@narod.ru> wrote:
> I am writing a program (in fact a Perl script), which should do replacement
> of strings in a file, like this:
>
> AB->BC
> BC->AB
> YY->ZZ
>
> Note that Sed and relatives fail to do this task: It would replace BC back
> with AB after replacing AB with BC, and this is not what I need.
>
> What command line syntax to specify strings to be replaced and replacement
> strings for my command?

This has nothing to do with syntax; it is a matter of semantics.

The sed -e option syntax could have the semantics of parallelism:

sed --fanatasy-parallel-behavior -e 's/AB/BC/g' -e 's/BC/AB/g'

How that would work is that the left hand sides of all the regex
replacements would be assembled into a big regex:

(AB|BC)

Based on which one matches, the appropriate replacement is applied.

> I need an advice about the syntax of command line.

If the behavior is strictly serial or strictly parallel, then there
is no special feature needed in the syntax.

Only if you want the expressivity of both. For instance, square
brackets could group together replacements which are parallel.

The open square bracket can combine with an option to create
richer expressivity.

Lack of square brackets indicates sequential processing.

# -p[ denotes parallel substitutions

$ ./super-replacer -p[ AB BC BC AB ] YY ZZ

Here, AB is replaced with BC, in parallel as BC is replaced with AB.
Then, YY is replaced with ZZ.

A differently-optioned open bracket can do something else:

# -s[ is a shorhand for swap

$ ./super-replacer -s[ AB BC ] YY ZZ

Here, -s[ AB BC ] expands to -p[ AB BC BC AB ]. -s[ can take an arbitrary
number of pairs, all of which are swapped in parallel.

Right rotation is handled with -r[, which takes any number of arguments
to the close bracket.

$ ./super-replacer -r[ AB BC CD ]

The above instance is equivalent to -p[ AB BC BC CD CD AB ].

A right shift could be represented with -h[:

$ ./super-replacer -h[ AB BC CD FOO ]

This is equivalent to -p [ AB BC BC CD CD FOO ]. "AB" falls out of the
"shift register", replaced by "BC", "BC" is replaced w ith "CD", and a "CD"
is replaced with "FOO", which is effectively shifted in.

robertwessel2@yahoo.com

12/22/2014 8:46:00 PM

On Mon, 22 Dec 2014 20:10:06 +0200, Victor Porton <porton@narod.ru>
wrote:

>I am writing a program (in fact a Perl script), which should do replacement
>of strings in a file, like this:
>
>AB->BC
>BC->AB
>YY->ZZ
>
>Note that Sed and relatives fail to do this task: It would replace BC back
>with AB after replacing AB with BC, and this is not what I need.

Sure it will, you just need to do it a bit differently. Do the
following:

change AB -> QQQQ
change BC -> AB
change YY-> ZZ
change QQQQ -> AB

If "QQQQ" conflicts with another string in your source, use a
different temporary string in the first and fourth steps.

Kaz Kylheku

12/22/2014 9:40:00 PM

On 2014-12-22, Robert Wessel <robertwessel2@yahoo.com> wrote:
> On Mon, 22 Dec 2014 20:10:06 +0200, Victor Porton <porton@narod.ru>
> wrote:
>
>>I am writing a program (in fact a Perl script), which should do replacement
>>of strings in a file, like this:
>>
>>AB->BC
>>BC->AB
>>YY->ZZ
>>
>>Note that Sed and relatives fail to do this task: It would replace BC back
>>with AB after replacing AB with BC, and this is not what I need.
>
>
> Sure it will, you just need to do it a bit differently. Do the
> following:
>
> change AB -> QQQQ
> change BC -> AB
> change YY-> ZZ
> change QQQQ -> AB
>
> If "QQQQ" conflicts with another string in your source, use a
> different temporary string in the first and fourth steps.

This can be made "air tight" if it includes an extra pass which
detects whether the rename string like QQQQ occurs in the file.

while not long enough (temp_string_list ) do
do
t = generate_unique_temp_string();
while (t occurs in input file or t occurs in replacement strings)
push t onto temp_string_list
done

After this, you have enough required temp strings.

Note that you never want the temp strings to actually have a form like QQQQ.
This is because QQQQ has a suffix and a prefix which can combine to
create a spuriouis QQQQ.

If the original text contains no QQQQ, but contains QQAB, after an
AB -> QQQQ a substitution you have QQQQQQ, oops!

A brace-delimited token like {QQQQ} would be better. An insertion of {QQQQ}
cannot combine with fragments of {QQQQ} to produce a spurious {QQQQ}.

comp.programming

Suggest a command line syntax

Victor Porton

Richard Heathfield

Victor Porton

Richard Heathfield

Kaz Kylheku

robertwessel2@yahoo.com

Kaz Kylheku

x Login to ForumsZone