that was never the issue though ... well, okay, that's *an* issue, but not the one i'm most concerned about :-) the issue i'm trying to fix (and so maybe need to find even clearer wording for) is basically this: * lots of people don't realize that pthread_t != pid_t * they think that "the worst that can happen" when passing a no-longer valid pthread_t to these functions is ESRCH * they don't realize that using pthread_kill(3) like this is just a use-after-free bug i think one reason this persists is glibc's thread cache makes it harder to hit there. i don't actually know whether glibc's thread cache has an eviction policy at all? if it doesn't, that would indeed turn this use-after-free into "just" a question of whether you have the right pid_t or not. but assuming glibc's thread cache _does_ have an eviction policy, glibc's in the same boat as more svelte libcs (such as bionic and musl, plus the BSDs, and also Apple's anonymous libc) --- it just needs more threads. this confusion causes bugs (and crashes) today, and it's only going to get worse as we get better tools for detecting UAF, such as Arm MTE, and it's really hard to get people to understand the problem when the man page is worded as it currently is (with a weak "can, for example" hidden in the NOTES section). if it really is the case that glibc has no eviction policy, then my suggestion will probably be that we come up with wording along the lines of "libc implementers face a choice here between the memory cost of never freeing [whatever man7 calls the TCB] or not being able to detect this temporal error; of all the libcs, only glibc chose the former". if nothing else that should at least answer the question "why can't you just be like glibc?" :-) the current text kind of sounds like "glibc has a great implementation, but POSIX doesn't require that, and everyone else sucks", but that's pretty misleading. (even if the glibc implementation is safe, which i'm not sure it is. it also seems like the "that's not a cache, that's a memory leak" implementation would preclude memory tagging for thread stacks, which would be another infelicity?) -*- this page is a bit weird in general... ESRCH isn't mentioned in ERRORS, but the sig == 0 case is called out in DESCRIPTION, but you need to read NOTES to find out that that's basically broken. and no-where on the page do we try to describe alternatives that _do_ work. (happy to volunteer text along the lines of "you need to stash your thread's tid at a time when you *know* the pthread_t is valid, such as when the thread starts, and then you can use that with kill(2) and sig == 0 to do what you _thought_ pthread_kill(3) with sig == 0 did, which still isn't 100% safe in light of pid wrapping, but is the best you can get if you refuse to actually keep track of your threads' lifetimes properly :-P ".) actually, even this would be quite a good improvement: If sig is 0, then no signal is sent, but error checking is still - performed. + performed. See NOTES for why this can't be used to detect whether another thread is still running. On Tue, Nov 9, 2021 at 11:16 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote: > > > any comment from either of the maintainers? > > > > i think what we currently have on this page is factually incorrect, > > and this patch better matches reality. > > One more data point: > > As of glibc 2.34, pthread_kill in glibc cannot fail with ESRCH anymore > (unless the kernel thread is terminated by a direct system call). And > the race that the signal could be sent to the wrong thread is gone. > > > On Tue, Nov 12, 2019 at 10:10 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote: > >> > >> * enh: > >> > >> > On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote: > >> >> > >> >> * enh: > >> >> > >> >> > no, because the C library has two choices when a thread exits: > >> >> > > >> >> > 1. unmap the thread. > >> >> > > >> >> > 2. keep the thread around for recycling. > >> >> > > >> >> > if you choose 1 (optimizing for space, like Android), your dereference > >> >> > is illegal. > >> >> > >> >> This choice is only available for threads in a detached state. For > >> >> joinable threads, a conforming implementation cannot immediately > >> >> deallocate all data structures on thread termination. Among other > >> >> things, it has to store the future return value of pthread_join > >> >> somewhere. > >> > > >> > ah, you're trying to say "signal 0 is potentially usable for a > >> > joinable thread that's waiting to be joined"? that's true, but i'm not > >> > sure how that's relevant to this patch. that wouldn't be an "invalid > >> > thread ID" until it's joined. > >> > >> Correct. That's POSIX's argument why ESRCH wouldn't be valid to > >> return here. It's still a forceful loss of information, and > >> particularly annoying since POSIX doesn't specify pthread_tryjoin. > >> > >> But I'm glad we've brought our discussion to a conclusion. 8-)