Re: [PATCH] pthread_kill.3: Update to match POSIX.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> the issue i'm trying to fix (and so maybe need to find even clearer
> wording for) is basically this:
>
>   * lots of people don't realize that pthread_t != pid_t
>   * they think that "the worst that can happen" when passing a
> no-longer valid pthread_t to these functions is ESRCH
>   * they don't realize that using pthread_kill(3) like this is just a
> use-after-free bug

Okay, this is just not about pthread_kill, though.  So man-pages-wise,
pthread_kill may not be the right place to document it.

> i think one reason this persists is glibc's thread cache makes it
> harder to hit there. i don't actually know whether glibc's thread
> cache has an eviction policy at all?

It's typically at least five entries deep, so it's pretty good at
obscuring these issues.

In glibc 2.34, the stack size is tunable, and it can be disabled (more
or less) using

  GLIBC_TUNABLES=glibc.pthread.stack_cache_size=0

for typical distribution builds that do not disable tunables.  If you do
that, you get a segmentation fault for such invalid pthread_kill calls.

(The ‘more or less’ part refers to detached threads, where the TCB
lingers around after exit because it is tied to the thread stack in our
current implementation.)

> if it doesn't, that would indeed
> turn this use-after-free into "just" a question of whether you have
> the right pid_t or not. but assuming glibc's thread cache _does_ have
> an eviction policy, glibc's in the same boat as more svelte libcs
> (such as bionic and musl, plus the BSDs, and also Apple's anonymous
> libc) --- it just needs more threads.

Right.

> this confusion causes bugs (and crashes) today, and it's only going to
> get worse as we get better tools for detecting UAF, such as Arm MTE,
> and it's really hard to get people to understand the problem when the
> man page is worded as it currently is (with a weak "can, for example"
> hidden in the NOTES section).

I helped to fixed an incorrect LTP test around precisely this, and the
GLIBC_TUNABLES setting was helpful to show that there was indeed a
use-after-free.  Maybe that can help you with your “just like glibc”
problems, too.

glibc 2.35 (and glibc 2.34 post-release) also break pthread-kill-based
probing loops to detect kernel thread exit.  An unjoined pthread_t can
“receive” signals even if the TID is no longer in use on the kernel
side.

> this page is a bit weird in general... ESRCH isn't mentioned in
> ERRORS, but the sig == 0 case is called out in DESCRIPTION, but you
> need to read NOTES to find out that that's basically broken. and
> no-where on the page do we try to describe alternatives that _do_
> work. (happy to volunteer text along the lines of "you need to stash
> your thread's tid at a time when you *know* the pthread_t is valid,
> such as when the thread starts, and then you can use that with kill(2)
> and sig == 0 to do what you _thought_ pthread_kill(3) with sig == 0
> did, which still isn't 100% safe in light of pid wrapping, but is the
> best you can get if you refuse to actually keep track of your threads'
> lifetimes properly :-P ".)

I don't think that's good advice.  Any such use has TID race issues
(even if you use tgkill).

> actually, even this would be quite a good improvement:
>
>         If sig is 0, then no signal is sent, but error checking is still
> -       performed.
> +       performed. See NOTES for why this can't be used to detect
> whether another thread is still running.

That makes sense.  (And I need to fix the bug that we don't have enough
error checking, now that we no longer try to send the signal.)

Thanks,
Florian





[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux