Asp Forum - POSIX semaphores: sem_wait fails when run in debugger?

Joe Knapka

8/9/2004 6:38:00 PM

Hi folks,

(Platform is Fedore Core 1, unmodified, running on a crufty old
no-name P2/233.)

Google has not helped me answer the following question:

sem_wait(sem_t*) seems to consistently return -1 when my code executes
inside gdb, while acting normally (that is, returning 0 as documented)
when run outside the debugger. I am certain the semaphore is being
initialized, although the call to sem_init() occurs in one thread and
the call to sem_wait() occurs in another thread which is started after
the sem_init() call (I'd expect that to be a usual situation). Is
there any known issue with POSIX semaphores and GDB? I can work
around this if necessary, but it's rather annoying and mysterious (a
dreadful combination).

Thanks,

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

20 Answers

Kasper Dupont

8/9/2004 7:22:00 PM

Joe Knapka wrote:
>
> Hi folks,
>
> (Platform is Fedore Core 1, unmodified, running on a crufty old
> no-name P2/233.)

I guess on such hardware you must be using the i386
version of glibc. That glibc version is buggy and will
cause programs to crash at random unless you disable
exec-shield-randomize.

You should add kernel.exec-shield-randomize = 0
to your /etc/sysctl.conf, that will solve some problems.

I don't know if this is related to your problem. But
the bug I have seen appears to be one threading library
making incorrect assumptions about the stack location.
AFAIK the i386 and i686 version of glibc use different
threading libraries, so it is very likely that your
problem would not show up on a Fedora Core 1 running on
newer hardware.

--
Kasper Dupont -- der bruger for meget tid paa usenet.
I'd rather be a hammer than a nail.

Joe Knapka

8/9/2004 7:42:00 PM

Joe Knapka <jknapka@kneuro.net> writes:

> Hi folks,
>
> (Platform is Fedore Core 1, unmodified, running on a crufty old
> no-name P2/233.)
>
> Google has not helped me answer the following question:
>
> sem_wait(sem_t*) seems to consistently return -1 when my code executes
> inside gdb, while acting normally (that is, returning 0 as documented)
> when run outside the debugger.

More specifically, this behavior occurs when the program is being
stepped in gdb, but only (as far as I can tell) if a "step" command is
performed in a different thread than the one doing the sem_wait()
call. Here is some code that demonstrates:

// File: semtest.c
#include <semaphore.h>
#include <pthread.h>
#include <time.h>
#include <stdio.h>

void* entry_point(void* arg) {
sem_t* psem = (sem_t*)arg;
int rc = sem_wait(psem);
if (rc != 0) printf("Strange: sem_wait() returned %d\n",rc);
return 0;
}

int main(void) {
pthread_t th;
sem_t sem;
int rc = sem_init(&sem,0,0);
if (rc != 0) {
if (rc != 0) printf("sem_init() failed: %d\n",rc);
return 0;
}
pthread_create(&th,0,entry_point,(void*)&sem);
sleep(120);
sem_destroy(&sem);
return 0;
}

If I compile and run that code (g++ -c -o semtest -lpthread semtest.c)
without breakpoints, either by itself or within GDB, all
is well. If I set a breakpoint on the sem_init() call and then step
through main(), I see the "Strange" message when I step
over the sleep(120) call, or sometimes immediately after the
pthread_create() call. However, if I set a breakpoint
in entry_point() and step over the sem_wait() call itself,
it seems to work fine.

Am I mis-using the semaphore API in some way? Or is this just an
unfortunate interaction between gdb and semaphores? If so, is there a
workaround?

Thanks,

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

Joe Knapka

8/9/2004 8:05:00 PM

Kasper Dupont <remove.invalid@nospam.lir.dk.invalid> writes:

> Joe Knapka wrote:
> >
> > Hi folks,
> >
> > (Platform is Fedore Core 1, unmodified, running on a crufty old
> > no-name P2/233.)
>
> I guess on such hardware you must be using the i386
> version of glibc. That glibc version is buggy and will
> cause programs to crash at random unless you disable
> exec-shield-randomize.
>
> You should add kernel.exec-shield-randomize = 0
> to your /etc/sysctl.conf, that will solve some problems.
>
> I don't know if this is related to your problem. But
> the bug I have seen appears to be one threading library
> making incorrect assumptions about the stack location.
> AFAIK the i386 and i686 version of glibc use different
> threading libraries, so it is very likely that your
> problem would not show up on a Fedora Core 1 running on
> newer hardware.

Hmm. I tried this, but it didn't help. I surmise that what's going on
here is that the sem_wait() call is being interrupted by a signal sent
by gdb during the step operation. That's kind of a bummer. I guess
one solution would be to do "while (sem_wait());" rather than plain
sem_wait(). (In fact that does seem to work OK.) Or perhaps there is
a way to do it in a less slipshod fashion, by masking signals on the
thread doing the sem_wait. Any advice is appreciated. I've done a
good bit of threaded programming on win32, but this is my first foray
into POSIX threadland. The Glibc docs on semaphores seems incomplete;
for example it makes the blanket statement "sem_wait() always returns
0", which is clearly false.

Thanks,

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

.

8/9/2004 8:23:00 PM

Joe Knapka wrote:

> Kasper Dupont <remove.invalid@nospam.lir.dk.invalid> writes:
>
>
>>Joe Knapka wrote:
>>
>>>Hi folks,
>>>
>>>(Platform is Fedore Core 1, unmodified, running on a crufty old
>>>no-name P2/233.)
>>
>>I guess on such hardware you must be using the i386
>>version of glibc. That glibc version is buggy and will
>>cause programs to crash at random unless you disable
>>exec-shield-randomize.
>>
>>You should add kernel.exec-shield-randomize = 0
>>to your /etc/sysctl.conf, that will solve some problems.
>>
>>I don't know if this is related to your problem. But
>>the bug I have seen appears to be one threading library
>>making incorrect assumptions about the stack location.
>>AFAIK the i386 and i686 version of glibc use different
>>threading libraries, so it is very likely that your
>>problem would not show up on a Fedora Core 1 running on
>>newer hardware.
>
>
> Hmm. I tried this, but it didn't help. I surmise that what's going on
> here is that the sem_wait() call is being interrupted by a signal sent
> by gdb during the step operation. That's kind of a bummer. I guess
> one solution would be to do "while (sem_wait());" rather than plain
> sem_wait(). (In fact that does seem to work OK.) Or perhaps there is
> a way to do it in a less slipshod fashion, by masking signals on the
> thread doing the sem_wait. Any advice is appreciated. I've done a
> good bit of threaded programming on win32, but this is my first foray
> into POSIX threadland. The Glibc docs on semaphores seems incomplete;
> for example it makes the blanket statement "sem_wait() always returns
> 0", which is clearly false.

If you change your code to wait for the entry_point thread to finish, by
adding the line
pthread_join(th, NULL);
just after sem_destroy(), you'll see that your program never exits since
entry_point() never exits. So the doc is correct as long as you don't
use gdb... ;-)

The doc also says:
sem_destroy destroys a semaphore object, freeing the
resources it might hold. No threads should be waiting on
the semaphore at the time sem_destroy is called.

A slightly modified version of your code,
void *entry_point(void *arg)
{
sem_t *psem = (sem_t *) arg;
int rc = sem_wait(psem);

if(rc != 0)
perror("sem_wait");
rreturn 0;
}

Prints this when run in gdb:
sem_wait: Interrupted system call

I guess the right thing to do is not to call sem_destroy() when another
thread is waiting, just like the doc says.

HTH,
boa@home

>
> Thanks,
>
> -- Joe
>

Joe Knapka

8/9/2004 9:04:00 PM

boa <root@localhost.com> writes:

> Joe Knapka wrote:
>
> > Kasper Dupont <remove.invalid@nospam.lir.dk.invalid> writes:
> >
> >>Joe Knapka wrote:
> >>
> >>>Hi folks,
> >>>
> >>>(Platform is Fedore Core 1, unmodified, running on a crufty old
> >>>no-name P2/233.)
> >>
> >>I guess on such hardware you must be using the i386
> >>version of glibc. That glibc version is buggy and will
> >>cause programs to crash at random unless you disable
> >>exec-shield-randomize.
> >>
> >>You should add kernel.exec-shield-randomize = 0
> >>to your /etc/sysctl.conf, that will solve some problems.
> >>
> >>I don't know if this is related to your problem. But
> >>the bug I have seen appears to be one threading library
> >>making incorrect assumptions about the stack location.
> >>AFAIK the i386 and i686 version of glibc use different
> >>threading libraries, so it is very likely that your
> >>problem would not show up on a Fedora Core 1 running on
> >>newer hardware.
> > Hmm. I tried this, but it didn't help. I surmise that what's going on
> > here is that the sem_wait() call is being interrupted by a signal sent
> > by gdb during the step operation. That's kind of a bummer. I guess
> > one solution would be to do "while (sem_wait());" rather than plain
> > sem_wait(). (In fact that does seem to work OK.) Or perhaps there is
> > a way to do it in a less slipshod fashion, by masking signals on the
> > thread doing the sem_wait. Any advice is appreciated. I've done a
> > good bit of threaded programming on win32, but this is my first foray
> > into POSIX threadland. The Glibc docs on semaphores seems incomplete;
> > for example it makes the blanket statement "sem_wait() always returns
> > 0", which is clearly false.
>
> If you change your code to wait for the entry_point thread to finish,
> by adding the line
> pthread_join(th, NULL);
> just after sem_destroy(), you'll see that your program never exits
> since entry_point() never exits. So the doc is correct as long as you
> don't use gdb... ;-)

True, the code is clearly wrong in the case where main() actually
sleeps for two minutes and then calls sem_destroy() while entry point
is in sem_wait(). Below is a more correct version of the test code,
which exhibits exactly the same behavior (sem_wait() is interrupted
when single-stepping in gdb). Is there anything wrong with this test
code? Anyway, it does seem that gdb is sending a signal that causes
sem_wait() to be interrupted. What is the correct way to deal with
that eventuality? (A pointer to a relevant manpage or FAQ would
suffice.)

Thanks,

-- Joe

// File: semtest.c
#include <semaphore.h>
#include <pthread.h>
#include <time.h>
#include <stdio.h>

void* entry_point(void* arg) {
sem_t* psem = (sem_t*)arg;
int rc = sem_wait(psem);
if (rc != 0) {
perror("sem_wait");
}
return 0;
}

int main(void) {
pthread_t th;
sem_t sem;
int rc = sem_init(&sem,0,0);
if (rc != 0) {
perror("sem_init");
return 0;
}
pthread_create(&th,0,entry_point,(void*)&sem);
sleep(120);
rc = sem_post(&sem);
if (rc != 0) {
perror("sem_post");
return 0;
}
rc = pthread_join(th,0);
if (rc != 0) {
perror("pthread_join");
return 0;
}
rc = sem_destroy(&sem);
if (rc != 0) {
perror("sem_destroy");
}
return 0;
}

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

Alexander Terekhov

8/9/2004 9:09:00 PM

Joe Knapka wrote: ...

http://groups.google.com/groups?selm=3D2026CF.D102C06...

regards,
alexander.

Joe Knapka

8/9/2004 9:29:00 PM

Alexander Terekhov <terekhov@web.de> writes:

> Joe Knapka wrote: ...
>
> http://groups.google.com/groups?selm=3D2026CF.D102C06...

OK. So the answer seems to be, "Be prepared for EINTR, and deal
accordingly." Thank you very much.

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

Alexander Terekhov

8/9/2004 11:15:00 PM

Joe Knapka wrote: ...

You better use mutexes and condvars.

regards,
alexander.

Kasper Dupont

8/10/2004 4:34:00 AM

Joe Knapka wrote:
>
> Anyway, it does seem that gdb is sending a signal that causes
> sem_wait() to be interrupted. What is the correct way to deal with
> that eventuality? (A pointer to a relevant manpage or FAQ would
> suffice.)

I guess you could just add a loop to retry the call until
it does not produce an -EINTR. But I would have expected
the kernel to do that, so I don't understand exactly why
you need to do that yourself.

I think something like this (untested) would work:
int sem_wait_rep(sem_t * sem)
{
int r;
do {
r=sem_wait(sem);
} while ((r==-1) && (errno=EINTR));
return r;
}

--
Kasper Dupont -- der bruger for meget tid paa usenet.
Design #413859655
It's a computer monitor! It is great for hammering in nails!

Joe Knapka

8/10/2004 4:36:00 AM

Alexander Terekhov <terekhov@web.de> writes:

> Joe Knapka wrote: ...
>
> You better use mutexes and condvars.

Could you explain further? I believe you mean that condition
variables+mutexes can replace semaphores, but is it necessary to do
that to get correct behavior?

I disregarded the condvars docs initially, since condvars seemed more
complicated than a semaphore. (The goal, BTW, is to implement a simple
queue for inter-thread messaging, with a single consumer and possibly
mutiple producers; this is the textbook application for semaphores
:-)) I just sat down and read the condvars material, and they do seem
more flexible then semaphores, especially since I could use them
(straighforwardly) to perform a timed wait for a message, which seems
to be a pain with POSIX semaphores. (Why didn't they supply a
wait-with-timeout API for sems???)

However, wrapping a check for EINTR around my sem_wait() calls
//seems// to work perfectly well for me given my existing code; are
you implying there are situations where that will fail? That's not
totally obvious from the message you cited (though I didn't have time
to read the links therein, only the message itself).

Thank you very much for your time and attention,

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
.... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

comp.programming.threads

POSIX semaphores: sem_wait fails when run in debugger?

Joe Knapka

Kasper Dupont

Joe Knapka

Joe Knapka

.

Joe Knapka

Alexander Terekhov

Joe Knapka

Alexander Terekhov

Kasper Dupont

Joe Knapka

x Login to ForumsZone