Re: [PATCH] pthread_kill.3: Update to match POSIX.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



that was never the issue though ... well, okay, that's *an* issue, but
not the one i'm most concerned about :-)

the issue i'm trying to fix (and so maybe need to find even clearer
wording for) is basically this:

  * lots of people don't realize that pthread_t != pid_t
  * they think that "the worst that can happen" when passing a
no-longer valid pthread_t to these functions is ESRCH
  * they don't realize that using pthread_kill(3) like this is just a
use-after-free bug

i think one reason this persists is glibc's thread cache makes it
harder to hit there. i don't actually know whether glibc's thread
cache has an eviction policy at all? if it doesn't, that would indeed
turn this use-after-free into "just" a question of whether you have
the right pid_t or not. but assuming glibc's thread cache _does_ have
an eviction policy, glibc's in the same boat as more svelte libcs
(such as bionic and musl, plus the BSDs, and also Apple's anonymous
libc) --- it just needs more threads.

this confusion causes bugs (and crashes) today, and it's only going to
get worse as we get better tools for detecting UAF, such as Arm MTE,
and it's really hard to get people to understand the problem when the
man page is worded as it currently is (with a weak "can, for example"
hidden in the NOTES section).

if it really is the case that glibc has no eviction policy, then my
suggestion will probably be that we come up with wording along the
lines of "libc implementers face a choice here between the memory cost
of never freeing [whatever man7 calls the TCB] or not being able to
detect this temporal error; of all the libcs, only glibc chose the
former".

if nothing else that should at least answer the question "why can't
you just be like glibc?" :-)

the current text kind of sounds like "glibc has a great
implementation, but POSIX doesn't require that, and everyone else
sucks", but that's pretty misleading. (even if the glibc
implementation is safe, which i'm not sure it is. it also seems like
the "that's not a cache, that's a memory leak" implementation would
preclude memory tagging for thread stacks, which would be another
infelicity?)

-*-

this page is a bit weird in general... ESRCH isn't mentioned in
ERRORS, but the sig == 0 case is called out in DESCRIPTION, but you
need to read NOTES to find out that that's basically broken. and
no-where on the page do we try to describe alternatives that _do_
work. (happy to volunteer text along the lines of "you need to stash
your thread's tid at a time when you *know* the pthread_t is valid,
such as when the thread starts, and then you can use that with kill(2)
and sig == 0 to do what you _thought_ pthread_kill(3) with sig == 0
did, which still isn't 100% safe in light of pid wrapping, but is the
best you can get if you refuse to actually keep track of your threads'
lifetimes properly :-P ".)

actually, even this would be quite a good improvement:

        If sig is 0, then no signal is sent, but error checking is still
-       performed.
+       performed. See NOTES for why this can't be used to detect
whether another thread is still running.


On Tue, Nov 9, 2021 at 11:16 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote:
>
> > any comment from either of the maintainers?
> >
> > i think what we currently have on this page is factually incorrect,
> > and this patch better matches reality.
>
> One more data point:
>
> As of glibc 2.34, pthread_kill in glibc cannot fail with ESRCH anymore
> (unless the kernel thread is terminated by a direct system call).  And
> the race that the signal could be sent to the wrong thread is gone.
>
> > On Tue, Nov 12, 2019 at 10:10 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote:
> >>
> >> * enh:
> >>
> >> > On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@xxxxxxxxxxxxx> wrote:
> >> >>
> >> >> * enh:
> >> >>
> >> >> > no, because the C library has two choices when a thread exits:
> >> >> >
> >> >> > 1. unmap the thread.
> >> >> >
> >> >> > 2. keep the thread around for recycling.
> >> >> >
> >> >> > if you choose 1 (optimizing for space, like Android), your dereference
> >> >> > is illegal.
> >> >>
> >> >> This choice is only available for threads in a detached state.  For
> >> >> joinable threads, a conforming implementation cannot immediately
> >> >> deallocate all data structures on thread termination.  Among other
> >> >> things, it has to store the future return value of pthread_join
> >> >> somewhere.
> >> >
> >> > ah, you're trying to say "signal 0 is potentially usable for a
> >> > joinable thread that's waiting to be joined"? that's true, but i'm not
> >> > sure how that's relevant to this patch. that wouldn't be an "invalid
> >> > thread ID" until it's joined.
> >>
> >> Correct.  That's POSIX's argument why ESRCH wouldn't be valid to
> >> return here.  It's still a forceful loss of information, and
> >> particularly annoying since POSIX doesn't specify pthread_tryjoin.
> >>
> >> But I'm glad we've brought our discussion to a conclusion. 8-)



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux