I'm hoping I'll found someone out there who knows the POSIX/Linux thread implementation we've now got in RH9 (NPTL?) in some detail!
We have an app. we're porting to RH9 using the new implementation, and we're getting a lock-up that I can't reproduce with a noddy example.
Essentially, one 'parent' thread is trying to kill a 'child' thread (that it created earlier) with 'pthread_cancel()'. The child thread, if gdb (and strace IIRC) is to be believed is sat in a sem_wait(). The child thread has set the cancel_type to ASYNC (so it should go away as soon as it's told).
The parent thread then immediately does a pthread_join() to ensure the child's gone - then nothing. The child stays in sem_wait(), the parent never returns from pthread_join().
Anyone know what can cause this? I think it's some funny race condition, as occasiona sprinkling of printf() debug can make it, if not go away, then less likely, but a small test prog. I wrote that does the same thing always kills the child thread as I'd expect, both if it's killed within sem_wait() and if it's killed just beforehand. AIUI, sem_wait() is supposed to be a cancel point anyway.
-- [neil@xxx ~]# rm -f .signature [neil@xxx ~]# ls -l .signature ls: .signature: No such file or directory [neil@xxx ~]# exit
-- Shrike-list mailing list Shrike-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/shrike-list