[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.lisp

using LISTEN as end of stream test

Mark Tarver

5/7/2015 3:02:00 PM

I want a non-destructive end of file test for an input byte stream (SBCL) from a file. Looking at CLTL etc. I find LISTEN. However I find that having read the last byte off the stream, LISTEN returns T. Only when I then try to read the next (non-existent) byte does LISTEN wise up to the fact that the stream is empty. Is this expected behaviour? Is there a workaround or a better substitute in the docs?

thx

Mark
11 Answers

Madhu

5/7/2015 3:18:00 PM

0


* Mark Tarver <4fd00883-cb3b-4315-a4a3-ebadf6f14ac1@googlegroups.com> :
Wrote on Thu, 7 May 2015 08:02:14 -0700 (PDT):

| I want a non-destructive end of file test for an input byte stream
| (SBCL) from a file. Looking at CLTL etc. I find LISTEN. However I
| find that having read the last byte off the stream, LISTEN returns T.
| Only when I then try to read the next (non-existent) byte does LISTEN
| wise up to the fact that the stream is empty. Is this expected
| behaviour? Is there a workaround or a better substitute in the docs?

It's not expected behaviour, it's a bug. You can check this yourself
with any other implementation. ---Madhu

Barry Margolin

5/7/2015 4:14:00 PM

0

In article <m3wq0kbenn.fsf@leonis4.robolove.meer.net>,
Madhu <enometh@meer.net> wrote:

> * Mark Tarver <4fd00883-cb3b-4315-a4a3-ebadf6f14ac1@googlegroups.com> :
> Wrote on Thu, 7 May 2015 08:02:14 -0700 (PDT):
>
> | I want a non-destructive end of file test for an input byte stream
> | (SBCL) from a file. Looking at CLTL etc. I find LISTEN. However I
> | find that having read the last byte off the stream, LISTEN returns T.
> | Only when I then try to read the next (non-existent) byte does LISTEN
> | wise up to the fact that the stream is empty. Is this expected
> | behaviour? Is there a workaround or a better substitute in the docs?
>
> It's not expected behaviour, it's a bug. You can check this yourself
> with any other implementation. ---Madhu

Although one could argue that the function is not really useful as
specified. If you don't try to read from the file until LISTEN returns
true, how will you discover that you've reached EOF?

Why are you using LISTEN on a file stream, anyway? The spec says:
"listen is intended to be used when input-stream obtains characters from
an interactive device such as a keyboard."

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Kaz Kylheku

5/7/2015 4:30:00 PM

0

On 2015-05-07, Mark Tarver <dr.mtarver@gmail.com> wrote:
> I want a non-destructive end of file test for an input byte stream

Pascal is down the hall to your left. (The language, not Costanza or
Bourgignon.)

Modern operating systems do not provide this, so if you want that in a language
it has to work by actually peeking for input and testing for failure.

Note that it is not really possible --- for every kind of device --- to report,
without false negatives "the data has reached the end", if it is to be a boolean
predicate.

Consider a network socket. Suppose the remote end sends 8 bytes and then
pauses. The local client reads all 8 bytes right up to the application level.
Now it wants to know, is the connection at EOF? is_at_eof(socket)?

The correct answer is "unknown". It has to be three-valued.

Case 1: more bytes have been sent and are queued.
EOF -> false

Case 2: no bytes are pending, but a connection closure has not been received.
EOF -> indeterminate

Case 3: no bytes are pending, and a connection closure has been received.
(FIN packet in TCP or whatever).
EOF -> true

If you want to reduce EOF to a realiable boolean test, then it has to be
a blocking operation. In the true and false cases, it can return immediately.
In the indeterminate case, it has to wait for an event to arrive, which
will turn case 2 into either case 3.

Pascal J. Bourguignon

5/7/2015 4:35:00 PM

0

Madhu <enometh@meer.net> writes:

> * Mark Tarver <4fd00883-cb3b-4315-a4a3-ebadf6f14ac1@googlegroups.com> :
> Wrote on Thu, 7 May 2015 08:02:14 -0700 (PDT):
>
> | I want a non-destructive end of file test for an input byte stream
> | (SBCL) from a file. Looking at CLTL etc. I find LISTEN. However I
> | find that having read the last byte off the stream, LISTEN returns T.
> | Only when I then try to read the next (non-existent) byte does LISTEN
> | wise up to the fact that the stream is empty. Is this expected
> | behaviour? Is there a workaround or a better substitute in the docs?
>
> It's not expected behaviour, it's a bug. You can check this yourself
> with any other implementation. ---Madhu

AFAICS, there's a bug in SBCL, but it's not what you believe.

LISTEN is specified as:

Returns true if there is a character immediately available from
input-stream; otherwise, returns false.

On a non-interactive input-stream, listen returns true except when
at end of file[1]. If an end of file is encountered, listen returns
false.

listen is intended to be used when input-stream obtains characters
from an interactive device such as a keyboard.


We must take into account foremost the first sentence: is a character
immediately available from input-stream? For file-streams, this can be
known non-destructively only if the stream is buffered, and if there is
a character in the input buffer.

Similarly for a buffered interactive stream (eg. a terminal in line
mode).

But for unbuffered streams (interactive, socket streams, etc), unless
you just called UNREAD-CHAR, we cannot assume either result.


The problem is that in general, end-of-file is not a state, it's an
event, despite the language used in the above description.

For example, in the case of file-streams on unix systems, the file can
grow at any time. Therefore if the implementation returned NIL to a
file-stream when (= (file-position s) (file-length s)), it could be
wrong, because between the time LISTEN returns and the time you test it,
another process may have added bytes to the file, and now we would have
(< (file-position s) (file-length s)). On the other hand, if it
returned true, then if no other process writes to the file, you will
encounter a end-of-file condition when you try to read. Or perhaps not,
if you wait too long and eventually another process writes to the file.

It could be argued that NIL is a better result to return when
(= (file-position s) (file-length s)), therefore sbcl is good.


For string streams, a definite answer can be given:

[pjb@kuiper :0.0 tmp]$ clall -r '(with-input-from-string (in "") (listen in))'

Armed Bear Common Lisp --> NIL
Clozure Common Lisp --> NIL
CLISP --> NIL
CMU Common Lisp --> NIL
ECL --> NIL
SBCL --> NIL


For unix file streams, NIL is a better answer, but implementations may
return T to avoid preventing programs to try to read (and see if there
is effectively an end-of-file condition to be signaled or not):

[pjb@kuiper :0.0 tmp]$ touch /tmp/empty
[pjb@kuiper :0.0 tmp]$ clall -r '(with-open-file (in "/tmp/empty") (listen in))'

Armed Bear Common Lisp --> NIL
Clozure Common Lisp --> T
CLISP --> NIL
CMU Common Lisp --> T
ECL --> T
SBCL --> NIL

The real fun begins with another process:

[pjb@kuiper :0.0 tmp]$ while sleep 1 ; do if [ -r empty ] ; then echo hi >> empty ; sleep 2 ; fi ; done &
[1] 15417
[pjb@kuiper :0.0 tmp]$ clall -r '(progn (ignore-errors (delete-file "/tmp/empty"))
(sleep 2)
(with-open-file (in "/tmp/empty" :direction :output :if-does-not-exist :create :if-exists :append))
(with-open-file (in "/tmp/empty")
(list (file-length in) (listen in)
(sleep 2)
(file-length in) (listen in))))'
> > > > > >
Armed Bear Common Lisp --> (0 NIL NIL 3 NIL)
Clozure Common Lisp --> (0 T NIL 0 T)
CLISP --> (0 NIL NIL 3 T)
CMU Common Lisp --> (0 T NIL 3 T)
ECL --> (0 T NIL 3 T)
SBCL --> (0 NIL NIL 3 NIL)

[pjb@kuiper :0.0 tmp]$

And there, we can see that both ABCL and SBCL have a bug, because LISTEN
doesn't return true when the file length has become greater than the file
position.


In any case, LISTEN is not the right tool to use.

For one thing, it's not clear what LISTEN should do on binary streams.
The argument type specified is "input stream designator", but the
description implies that LISTEN only works on character streams.

(On the other hand, FILE-LENGTH only works reliably on byte streams, not
on character streams).


Instead of LISTEN, you will have to use READ-BYTE, and test for the
END-OF-FILE condition. Since there is no UNREAD-BYTE, only a
UNREAD-CHAR, you won't be able to unread the byte with standard
operators. But flexi-stream has a UNREAD-BYTE.

Alternatively, you may use
(= (file-position s) (file-length s))
but this result may be invalidated as soon as you get it, in presence of
other processes modifying the file.

Also, remember that on unix, the file position can be beyond the file
size (so that you may write sparse file), so if another process
truncates the file, you may also have
(> (file-position s) (file-length s))
.


--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.� -- Carl Bass CEO Autodesk

William James

5/7/2015 8:20:00 PM

0

Dr. Mark Tarver wrote:

> I want a non-destructive end of file test
> for an input byte stream (SBCL) from a file.
> Looking at CLTL etc. I find LISTEN.
> However I find that having read the last
> byte off the stream, LISTEN returns T. Only
> when I then try to read the next
> (non-existent) byte does LISTEN wise up to
> the fact that the stream is empty. Is this
> expected behaviour? Is there a workaround
> or a better substitute in the docs?


Gauche Scheme:

(with-input-from-file "7z.dll" (lambda ()
(let go ((cnt 0))
(if (eof-object? (peek-byte))
cnt
(begin (read-byte)
(go (+ 1 cnt)))))))

===>
713728

--
The Ortagard school in Rosengard, an area of Malmo with close to 100% Muslim
immigrants, is burning yet again. Several police patrols are called out. But
Prime Minister Persson has already been escorted by special security police
into his bulletproof Volvo....
fjordman.blogspot.ca/2005/05/is-swedish-democracy-collapsing.html

Madhu

5/8/2015 1:02:00 AM

0


* "Pascal J. Bourguignon" <87d22cfir7.fsf@kuiper.lan.informatimago.com> :
Wrote on Thu, 07 May 2015 18:35:24 +0200:

| We must take into account foremost the first sentence: is a character
| immediately available from input-stream? For file-streams, this can be
| known non-destructively only if the stream is buffered, and if there is
| a character in the input buffer.

No. For file streams the file size and file position are well known.
The stream implementation accounting keeps track of which position is
being read if the stream is buffered or if it unbuffered. The
end-of-file is entirely determinable.


| Similarly for a buffered interactive stream (eg. a terminal in line
| mode).
|
| But for unbuffered streams (interactive, socket streams, etc), unless
| you just called UNREAD-CHAR, we cannot assume either result.

implementation of UNREAD-CHAR can again use a single charcter
buffer. This is all very hokey while the spec is precise.

| The problem is that in general, end-of-file is not a state, it's an
| event, despite the language used in the above description.

In the case of file-streams end-of-file is a state, which is reached
when the last character or byte has been read.

| For example, in the case of file-streams on unix systems, the file can
| grow at any time. Therefore if the implementation returned NIL to a
| file-stream when (= (file-position s) (file-length s)), it could be
| wrong, because between the time LISTEN returns and the time you test it,
| another process may have added bytes to the file, and now we would have
| (< (file-position s) (file-length s)).

Wrong. FILE-LENGTH would reflect the new length and the call to
LISTEN would reflect the present state.

| On the other hand, if it returned true, then if no other process
| writes to the file, you will encounter a end-of-file condition when
| you try to read. Or perhaps not, if you wait too long and eventually
| another process writes to the file.

These are peripheral issues. There are any number of hypothetical
situations and solutions --- what if the file is truncated, file locks,
ccl making streams exclusive to processes, none of these affect the
specification of LISTEN. The behaviour of SBCL is a bug UNIX or not.

| It could be argued that NIL is a better result to return when
| (= (file-position s) (file-length s)), therefore sbcl is good.

It would be a specious argument typical of SBCL developers which should
set the warning bells ringing in the mind of any developer with
understanding and reason and make him seek another implementation.

--- Madhu

Pascal J. Bourguignon

5/8/2015 2:22:00 AM

0

Madhu <enometh@meer.net> writes:

> * "Pascal J. Bourguignon" <87d22cfir7.fsf@kuiper.lan.informatimago.com> :
> Wrote on Thu, 07 May 2015 18:35:24 +0200:
>
> | We must take into account foremost the first sentence: is a character
> | immediately available from input-stream? For file-streams, this can be
> | known non-destructively only if the stream is buffered, and if there is
> | a character in the input buffer.
>
> No. For file streams the file size and file position are well known.

No. For character file streams, the file length is not specified,
therefore there can be no conforming usage of file-length on character
file streams.


file-length returns the length of stream, or nil if the length
cannot be determined.

For a binary file, the length is measured in units of the element
type of the stream.

Notice how nothing is said specifically for text files, so only the
first paragraph applies. Nowhere is the length of stream defined.

If you assume as is hinted for binary files, that for text files the
length is the number of characters, then you can see that
implementations "get your assumptions wrong":


[pjb@kuiper :0.0 tmp]$ clall -r '(let ((path (merge-pathnames
#P"./tmp/misc/wang.utf-8"
(user-homedir-pathname)))
fl cc)
(with-open-file (in path
:external-format
#+clisp charset:utf-8
#-clisp :utf-8)
(setf fl (file-length in))
(setf cc (length
(with-output-to-string
(*standard-output*)
(loop for ch = (read-char in nil nil)
while ch
do (princ ch))))))
(list fl cc))'

Armed Bear Common Lisp --> (1511 1468)
Clozure Common Lisp --> (1511 1468)
CLISP --> (1511 1468)
CMU Common Lisp --> (1511 1468)
ECL --> (1511 1468)
SBCL --> (1511 1468)



Also, while I've proposed (= (file-position s) (file-length s)), and
while it would work on most implementations, this IS NOT A CONFORMING
expression at all, since file-position is only defined in terms of
itself as a monotonically incrementing integer, without any relationship
to the file-length or anything.

Basically, what you would have to do if you want to use file-position,
is to read the file first, detect the end-of-file condition, note the
current file-position as the position of the end of the file, and then
you could move back and read again, comparing the file positions with
the previously obtained one.


> The stream implementation accounting keeps track of which position is
> being read if the stream is buffered or if it unbuffered. The
> end-of-file is entirely determinable.

At one point in time, but one nanosecond later, it's wrong. That's the
problem with concurent and multi-process multi-user systems.


> | Similarly for a buffered interactive stream (eg. a terminal in line
> | mode).
> |
> | But for unbuffered streams (interactive, socket streams, etc), unless
> | you just called UNREAD-CHAR, we cannot assume either result.
>
> implementation of UNREAD-CHAR can again use a single charcter
> buffer. This is all very hokey while the spec is precise.

The OP asked for BINARY streams!


> | The problem is that in general, end-of-file is not a state, it's an
> | event, despite the language used in the above description.
>
> In the case of file-streams end-of-file is a state, which is reached
> when the last character or byte has been read.

No, as I explained in details, in unix, a file can be modified while it
is being open and read, be it by other processes or by the same process!
Therefore as soon as a syscall returns, any value it may give you about
the state of the file system can and will be wrong!

Learn your unix!



> | For example, in the case of file-streams on unix systems, the file can
> | grow at any time. Therefore if the implementation returned NIL to a
> | file-stream when (= (file-position s) (file-length s)), it could be
> | wrong, because between the time LISTEN returns and the time you test it,
> | another process may have added bytes to the file, and now we would have
> | (< (file-position s) (file-length s)).
>
> Wrong. FILE-LENGTH would reflect the new length and the call to
> LISTEN would reflect the present state.

How many times do we have to repeat that with multi-processes (and now
multi-core! processors), things can be changed behind your back!?!


> | On the other hand, if it returned true, then if no other process
> | writes to the file, you will encounter a end-of-file condition when
> | you try to read. Or perhaps not, if you wait too long and eventually
> | another process writes to the file.
>
> These are peripheral issues.

Nope, this is the core the issue when programming on unix systems.

> There are any number of hypothetical
> situations and solutions --- what if the file is truncated, file locks,
> ccl making streams exclusive to processes, none of these affect the
> specification of LISTEN. The behaviour of SBCL is a bug UNIX or not.

Nope. I've demonstrated explicitely that the behavior complained about
wasn't a bug in sbcl. (but that sbcl has another bug).


> | It could be argued that NIL is a better result to return when
> | (= (file-position s) (file-length s)), therefore sbcl is good.
>
> It would be a specious argument typical of SBCL developers which should
> set the warning bells ringing in the mind of any developer with
> understanding and reason and make him seek another implementation.

I'm not a sbcl maintainer.


--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.� -- Carl Bass CEO Autodesk

Madhu

5/8/2015 4:10:00 AM

0


* "Pascal J. Bourguignon" <87oalverla.fsf@kuiper.lan.informatimago.com> :
Wrote on Fri, 08 May 2015 04:22:09 +0200:

|> | We must take into account foremost the first sentence: is a character
|> | immediately available from input-stream? For file-streams, this can be
|> | known non-destructively only if the stream is buffered, and if there is
|> | a character in the input buffer.
|>
|> No. For file streams the file size and file position are well known.
|
| No. For character file streams, the file length is not specified,
| therefore there can be no conforming usage of file-length on character
| file streams.

Are you disputing that the file has a certain length? This has nothing
to do with CL's FILE-LENGTH. Are you disputing there is a
file-position? This has nothing to do with CL's FILE-POSITION.


| file-length returns the length of stream, or nil if the length
| cannot be determined.
|
| For a binary file, the length is measured in units of the element
| type of the stream.
|
| Notice how nothing is said specifically for text files, so only the
| first paragraph applies. Nowhere is the length of stream defined.

You are purposely deviously confounding CL's user definitions of
FILE-LENGTH and FILE-POSITION with the underlying concepts of file
length and file position. The underlying concepts of file-length and
file-position unambiguously tell you whether or not you are at an
end-of-file situation, regardless of whether this is a character stream
or a binary stream or a bivalent hybrid stream or any extended stream.

<snip>


|> The stream implementation accounting keeps track of which position is
|> being read if the stream is buffered or if it unbuffered. The
|> end-of-file is entirely determinable.
|
| At one point in time, but one nanosecond later, it's wrong. That's the
| problem with concurent and multi-process multi-user systems.

It makes no difference. That race conditions exist unrelated. The
specification relates to the state at the point the call is made.

|> | Similarly for a buffered interactive stream (eg. a terminal in line
|> | mode).
|> |
|> | But for unbuffered streams (interactive, socket streams, etc), unless
|> | you just called UNREAD-CHAR, we cannot assume either result.
|>
|> implementation of UNREAD-CHAR can again use a single charcter
|> buffer. This is all very hokey while the spec is precise.
|
| The OP asked for BINARY streams!

The unread-char strawman was yours, not mine.

|> | The problem is that in general, end-of-file is not a state, it's an
|> | event, despite the language used in the above description.
|>
|> In the case of file-streams end-of-file is a state, which is reached
|> when the last character or byte has been read.
|
| No, as I explained in details, in unix, a file can be modified while it
| is being open and read, be it by other processes or by the same process!
| Therefore as soon as a syscall returns, any value it may give you about
| the state of the file system can and will be wrong!
|
| Learn your unix!

It does not matter, your hypothetical situations do not apply. The
problem is the same in the alternative you want to provide. Your file
will signal and END-OF-FILE and the next nanosecond you could have
written to it so it is no longer at the END OF the FILE

This does not make the END-OF-FILE signalling wrong at the time it was
signalled.

Do you see where your flawed reasoning leads? This is only going in
circles now. ---Madhu


Mark Tarver

5/8/2015 3:21:00 PM

0

On Thursday, May 7, 2015 at 9:20:57 PM UTC+1, WJ wrote:
> Dr. Mark Tarver wrote:
>
> > I want a non-destructive end of file test
> > for an input byte stream (SBCL) from a file.
> > Looking at CLTL etc. I find LISTEN.
> > However I find that having read the last
> > byte off the stream, LISTEN returns T. Only
> > when I then try to read the next
> > (non-existent) byte does LISTEN wise up to
> > the fact that the stream is empty. Is this
> > expected behaviour? Is there a workaround
> > or a better substitute in the docs?
>
>
> Gauche Scheme:
>
> (with-input-from-file "7z.dll" (lambda ()
> (let go ((cnt 0))
> (if (eof-object? (peek-byte))
> cnt
> (begin (read-byte)
> (go (+ 1 cnt)))))))
>
> ===>
> 713728
>
> --
> The Ortagard school in Rosengard, an area of Malmo with close to 100% Muslim
> immigrants, is burning yet again. Several police patrols are called out. But
> Prime Minister Persson has already been escorted by special security police
> into his bulletproof Volvo....
> fjordman.blogspot.ca/2005/05/is-swedish-democracy-collapsing.html

What we have in this case is a simple byte stream from a file with no new information being sent down the source from some event generator - like a person. PEEK-CHAR doesn't hack because I have a byte stream and just gives an error. It seems to me that this is a hole in the CL spec. With UNREAD-BYTE this could be solved.

I can and do use READ-BYTE to return an error object -1 but for certain apps this is not a good approach - in particular certain elegant recursive programs that combine parsing and reading become possible when non-destructive EOF testing is available.

Looks like I'll have to write workaround code.

thx for your time

Mark

Pascal J. Bourguignon

5/8/2015 4:14:00 PM

0

Madhu <enometh@meer.net> writes:

> This does not make the END-OF-FILE signalling wrong at the time it was
> signalled.

No. It makes it wrong one nanosecond later.
--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.� -- Carl Bass CEO Autodesk