Asp Forum - tree-analyser workflow tool [opinion wanted]

Jacek Grzebyta

1/7/2016 3:35:00 PM

Hi,

I have published on GitHub tree-analyser workflow tool at

https://github.com/jgrzebyta/tre...

The aim of the tool is to automatise parsing files within a
files tree according to prepared configuration file.
The tool is able to parse an input file using either system-wide
software or dedicated scripts.
Example configuration file is included in test/example.config file and
example "project" is within test/example directory.

Any opinions are wolcome.

Regards,
Jacek

6 Answers

Kaz Kylheku

1/7/2016 4:16:00 PM

On 2016-01-07, Jacek Grzebyta <jgrzebyta@users.sourceforge.net> wrote:
> Hi,
>
> I have published on GitHub tree-analyser workflow tool at
>
> https://github.com/jgrzebyta/tre...

"Do this shell command to these files in this directory" is much
more easily achieved using ... the shell!

Let us compare:

(:id "root"
:directory "example/"
:resources (
(:id "pdfs"
:resources "*.pdf"
:parsers (
:commands ("pdftotext -layout -nopgbrk $FILENAME -"
"sed -n \"/Table of .*/,/Chapter/p\" -"
)
:output "pdfs.txt"
)
)
(:id "as*-pdfs"
:resources "as*.pdf"
:parsers (
:commands ("pdftotext -layout $FILENAME -")
:output "as-pdfs.txt"
)
)
)
)

Versus:

root()
{
> pdfs.txt # ensure existence and zero length

for file in examples/*.pdf ; do
pdftotext -layout -nopgbrk $file - |
sed -n "/Table of .*/,/Chapter/p" >> pdfs.txt
}

etc.

P.S. please do what normal Lisp programmers do and close your
parens like )))))).

Jacek Grzebyta

1/7/2016 5:33:00 PM

Kaz Kylheku <kaz@kylheku.com> writes:

> On 2016-01-07, Jacek Grzebyta <jgrzebyta@users.sourceforge.net> wrote:
>> Hi,
>>
>> I have published on GitHub tree-analyser workflow tool at
>>
>> https://github.com/jgrzebyta/tre...
>
> "Do this shell command to these files in this directory" is much
> more easily achieved using ... the shell!
>
> Let us compare:
>
> (:id "root"
> :directory "example/"
> :resources (
> (:id "pdfs"
> :resources "*.pdf"
> :parsers (
> :commands ("pdftotext -layout -nopgbrk $FILENAME -"
> "sed -n \"/Table of .*/,/Chapter/p\" -"
> )
> :output "pdfs.txt"
> )
> )
> (:id "as*-pdfs"
> :resources "as*.pdf"
> :parsers (
> :commands ("pdftotext -layout $FILENAME -")
> :output "as-pdfs.txt"
> )
> )
> )
> )
>
> Versus:
>
> root()
> {
> > pdfs.txt # ensure existence and zero length
>
> for file in examples/*.pdf ; do
> pdftotext -layout -nopgbrk $file - |
> sed -n "/Table of .*/,/Chapter/p" >> pdfs.txt
> }
>
> etc.
>
> P.S. please do what normal Lisp programmers do and close your
> parens like )))))).

Hi,

1. I wanted do this in lisp
2. In my real project I have to fix more complex requirements:
- output files are consumed at least twice: converted into some final format and taken to another flow
- there is no single standard input file which is taken into account thus I need to use many customised scripts
within flows
- there is order of flow execution even if they are written without orders

Regards,
Jacek

Jacek Grzebyta

1/8/2016 1:22:00 PM

Kaz Kylheku <kaz@kylheku.com> writes:

> On 2016-01-07, Jacek Grzebyta <jgrzebyta@users.sourceforge.net> wrote:
>> Hi,
>>
>> I have published on GitHub tree-analyser workflow tool at
>>
>> https://github.com/jgrzebyta/tre...
>
> "Do this shell command to these files in this directory" is much
> more easily achieved using ... the shell!
>
> cut

Thanks for your comment. I thought about that overnight and I have another
argument pros lisp. It is a piping system. I would like to replace in the future
simple shell's pipe " | " by proper multiple output pipes. Also it allows to avoid
having many 'in between' task's output files.

Actually I would like to receive as a final goal one large dataset (RDF converted).

+ ----------+ +------------+
| data feed |--------> | conversion |----+
| task 1 | | tasks 1 | | +------------+
------------+ +------------+ | | conversion |
+--- merging ---> | to RDF |
+-----------+ +------------+ | +------------+
| data feed |--------> | conversion |----+
| task 2 | | tasks 2 |
+-----------+ +------------+

Where conversion tasks are in different combinations: parallel or in queue.
IMHO it is quite challenging having that as shell script.

> P.S. please do what normal Lisp programmers do and close your
> parens like )))))).

I just visualised config file for non lisp experts including me. :-)
The config file would be written much simpler anyway.
And will be changed during developing my real project.

Regards,
Jacek

PS Happy New Year to All

Pascal J. Bourguignon

1/9/2016 12:19:00 AM

Jacek Grzebyta <jgrzebyta@users.sourceforge.net> writes:

> Kaz Kylheku <kaz@kylheku.com> writes:
>
> Actually I would like to receive as a final goal one large dataset (RDF converted).
>
> + ----------+ +------------+
> | data feed |--------> | conversion |----+
> | task 1 | | tasks 1 | | +------------+
> ------------+ +------------+ | | conversion |
> +--- merging ---> | to RDF |
> +-----------+ +------------+ | +------------+
> | data feed |--------> | conversion |----+
> | task 2 | | tasks 2 |
> +-----------+ +------------+
>
> Where conversion tasks are in different combinations: parallel or in queue.
> IMHO it is quite challenging having that as shell script.

You're cheating, with this:

-+
|
+-
|
-+

You should draw:

+---------+
-->|3 |
| merging |-->
-->|4 |
+---------+

instead.

If your merging process actually read input from multiple file
descriptors (and not only on stdin), then it would be trivial in bash to
redirect to those file descriptors with <&.

b | ( a | merge 3<&0 ) 4<&0 )

Granted, the syntax is a little strange, because | â?¦ n<&0 has to be
understood as a single operator. One would want to write n| insteadâ?¦

b 4| ( a 3| merge )

Also, granted, the destination of the file descriptor 4 is implicit, one
cannot see if it's a or merge who will read from it.

But well, it's bash, what did you expect?

So for example you can have:

function merge (){
read l <&3
echo $l
read l <&4
echo $l
}

$ echo hello | ( echo world | merge 4<&0) 3<&0
hello
world

and your chain is trivial to write in bash:

( ( data-feed-task-1 | conversion-tasks-1 ) |
( ( data-feed-task-2 | conversion-tasks-2 ) | merging 3<&0 )
4<&0 ) | conversion-to-rdf

The problem is rather that very few unix programs take more than one
input file, so we don't have often the opportunity to write such
redirections. The only instance I met was a program that took a
password from a file descriptor indicated in the arguments, perhaps it
was an old version of gpg?

--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.â? -- Carl Bass CEO Autodesk

Jacek Grzebyta

1/15/2016 3:34:00 PM

"Pascal J. Bourguignon" <pjb@informatimago.com> writes:

Dear Pascal and Kaz,

Thanks a lot for advice.
>
> b | ( a | merge 3<&0 ) 4<&0 )
>
> Granted, the syntax is a little strange, because | â?¦ n<&0 has to be
> understood as a single operator. One would want to write n| insteadâ?¦
>

I am not sure what is the order of execution.

> b 4| ( a 3| merge )

When I put that into script then I have the bad file descriptor error.

echo hello 3| (echo world 4| merge)
> The problem is rather that very few unix programs take more than one
> input file, so we don't have often the opportunity to write such
> redirections. The only instance I met was a program that took a
> password from a file descriptor indicated in the arguments, perhaps it
> was an old version of gpg?

In my real project I have multiple ins and outs. :-( That why I
wanted to use lisp. Actually I am moving toward Clojure as I can use
some Java stuff which are not supported by Common Lisp.

Regards,
Jacek

Pascal J. Bourguignon

1/15/2016 8:47:00 PM

Jacek Grzebyta <jgrzebyta@users.sourceforge.net> writes:

> "Pascal J. Bourguignon" <pjb@informatimago.com> writes:
>
>
> Dear Pascal and Kaz,
>
> Thanks a lot for advice.
>>
>> b | ( a | merge 3<&0 ) 4<&0 )
>>
>> Granted, the syntax is a little strange, because | â?¦ n<&0 has to be
>> understood as a single operator. One would want to write n| insteadâ?¦
>>
>
> I am not sure what is the order of execution.

Those | are shell pipes. Therefore the order of execution is "in
parallel".

>> b 4| ( a 3| merge )
>
> When I put that into script then I have the bad file descriptor error.
>
> echo hello 3| (echo world 4| merge)

Yes, because this syntax doesn't exist in bash. This is what we WOULD
want, but what we don't have.

To have it, you WOULD have to get the sources of bash, read them, notice
some bug that have lurked in there for fifteen years, correct it, and
then you could add this syntax to improve bash syntax.

>> The problem is rather that very few unix programs take more than one
>> input file, so we don't have often the opportunity to write such
>> redirections. The only instance I met was a program that took a
>> password from a file descriptor indicated in the arguments, perhaps it
>> was an old version of gpg?
>
> In my real project I have multiple ins and outs. :-( That why I
> wanted to use lisp. Actually I am moving toward Clojure as I can use
> some Java stuff which are not supported by Common Lisp.

No, don't use Clojure. Use ABCL. So you can use some Java stuff, and
all of Common Lisp stuff.

--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.â? -- Carl Bass CEO Autodesk

comp.lang.lisp

tree-analyser workflow tool [opinion wanted]

Jacek Grzebyta

Kaz Kylheku

Jacek Grzebyta

Jacek Grzebyta

Pascal J. Bourguignon

Jacek Grzebyta

Pascal J. Bourguignon

x Login to ForumsZone