Re: Problems with the new pthread clock implementations

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> · Mon, 23 Nov 2020 17:12:52 +0100

Hello Adhemerval,

On Mon, 23 Nov 2020 at 15:39, Adhemerval Zanella
<adhemerval.zanella@xxxxxxxxxx> wrote:
>
>
>
> On 21/11/2020 18:41, Michael Kerrisk (man-pages) wrote:
> > Hello Mike,
> >
> > On 11/21/20 6:54 PM, Mike Crowe wrote:
> >> Hi Michael,
> >>
> >> On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote:
> >>> I've been taking a closer look at the the new pthread*clock*() APIs:
> >>> pthread_clockjoin_np()
> >>> pthread_cond_clockwait()
> >>> pthread_mutex_clocklock()
> >>> pthread_rwlock_clockrdlock()
> >>> pthread_rwlock_clockwrlock()
> >>> sem_clockwait()
> >>>
> >>> I've noticed some oddities, and at least a couple of bugs.
> >>>
> >>> First off, I just note that there's a surprisingly wide variation in
> >>> the low-level futex calls being used by these APIs when implementing
> >>> CLOCK_REALTIME support:
> >>>
> >>> pthread_rwlock_clockrdlock()
> >>> pthread_rwlock_clockwrlock()
> >>> sem_clockwait()
> >>> pthread_cond_clockwait()
> >>>     futex(addr,
> >>>         FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
> >>>         {abstimespec}, FUTEX_BITSET_MATCH_ANY)
> >>>     (This implementation seems to be okay)
> >>>
> >>> pthread_clockjoin_np()
> >>>     futex(addr, FUTEX_WAIT, 48711, {reltimespec})
> >>>     (This is buggy; see below.)
> >>>
> >>> pthread_mutex_clocklock()
> >>>     futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
> >>>     (There's bugs and strangeness here; see below.)
> >>
> >> Yes, I found it very confusing when I started adding the new
> >> pthread*clock*() functions, and it still takes me a while to find the right
> >> functions when I look now. I believe that Adhemerval was talking about
> >> simplifying some of this.
> >>
> >>> === Bugs ===
> >>>
> >>> pthread_clockjoin_np():
> >>> As already recognized in another mail thread [1], this API accepts any
> >>> kind of clockid, even though it doesn't support most of them.
> >>
> >> Well, it sort of does support them at least as well as many other
> >> implementations of such functions do - it just calculates a relative
> >> timeout using the supplied lock and then uses that. But, ...
> >>
> >>> A further bug is that even if CLOCK_REALTIME is specified,
> >>> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
> >>> (Currently it does this for *all* clockid values.) The problem here is
> >>> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
> >>> by default. At the least, the FUTEX_CLOCK_REALTIME is required for
> >>> this case. Alternatively, an implementation using
> >>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
> >>> functions listed above) might be appropriate.
> >>
> >> ...this is one downside of that. That bug was inherited from the
> >> existing pthread_clock_timedjoin_np implementation.
> >
>
> Indeed, I am working on refactoring the futex internal usage to fix
> this issue.  Thinking twice, I see that using FUTEX_WAIT_BITSET without
> any additional clock adjustments should be better than calling a
> clock_gettime plus FUTEX_WAIT.

Yes, that would be my estimate as well/
>
> > Oh -- that's pretty sad. I hadn't considered the possibility that
> > the (longstanding) "timed" functions might have the same bug.
> >
> >> I was planning to write a patch to just limit the supported clocks, but
> >> I'll have a go at fixing the bug you describe properly instead first which
> >> will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway.
>
> I am working on this as well.

Thanks.

> >>> ===
> >>>
> >>> pthread_mutex_clocklock():
> >>> First of all, there's a small oddity. Suppose we specify the clockid
> >>> as CLOCK_REALTIME, and then while the call is blocked, we set the
> >>> clock realtime backwards. Then, there will be further futex calls to
> >>> handle the modification to the clock (and possibly multiple futex
> >>> calls if the realtime clock is adjusted repeatedly):
> >>>
> >>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
> >>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
> >>>         ...
> >>>
> >>> Then there seems to be a bug. If we specify the clockid as
> >>> CLOCK_REALTIME, and while the call is blocked we set the realtime
> >>> clock forwards, then the blocking interval of the call is *not*
> >>> adjusted (shortened), when of course it should be.
> >>
> >> This is because __lll_clocklock_wait ends up doing a relative wait rather
> >> than an absolute one so it suffers from the same problem as
> >> pthread_clockjoin_np.
>
> It is another indication that it would be better to use FUTEX_WAIT_BITSET
> instead.

:-)

> >>> ===
> >>>
> >>> I've attached a couple of small test programs at the end of this mail.
> >>
> >> Thanks for looking at this in detail.
> >>
> >> AFAIK, all of these bugs also affected the corresponding existing
> >> pthread*timed*() functions. When I added the new pthread*clock*() functions
> >> I was trying to keep my changes to the existing code as small as possible.
> >> (I started out trying to "scratch the itch" of libstdc++
> >> std::condition_variable::wait_for misbehaving[2] when the system clock was
> >> warped in 2015 and all of this ballooned from that.) Now that the functions
> >> are in, I think there's definitely scope for improving the implementation
> >> and I will try to do so as time and confidence allows - the implementation
> >> of __pthread_mutex_clocklock_common scares me greatly!
> >
> > Yeah, a lot of glibc code is not so easy to follow... Thank you for
> > taking a look.
>
> The futex code in indeed convoluted, it was initially coded all at
> lowlevellock.h.  Then it was moved out to lowlevellock-futex.h with the
> NaCL port (which required an override of the futex call to implement
> the NaCL libcalls).
>
> Later, the futex-internal.h was added that duplicated some
> lowlevellock-futex.h call with inline function plus some error checking
> (as libstdc++ does).
>
> So currently we have the nptl pthread code using both interfaces, which is
> confusing and the duplicate the logic.  The patchset I am working makes the
> NPTL call to use only futex-internal.h, remove some non required function
> from it, and simplify the functions required on futex-internal.c.
>
> The idea is lowlevellock-futex.h would be used only for lowlevellock.h
> and futex-internal.h.  I am thinking whether it would be useful to
> keep with lowlevellock-futex.h, it just a thin wrapper over futex syscall
> with a *lot* of unused macros and without proper y2038 support (as
> futex-internal.h does).

Thanks, Adhemerval. And more generally, thanks for all of the clean-up
work you do in the codebase. That's just so valuable!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/