On 21/11/2020 18:41, Michael Kerrisk (man-pages) wrote: > Hello Mike, > > On 11/21/20 6:54 PM, Mike Crowe wrote: >> Hi Michael, >> >> On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote: >>> I've been taking a closer look at the the new pthread*clock*() APIs: >>> pthread_clockjoin_np() >>> pthread_cond_clockwait() >>> pthread_mutex_clocklock() >>> pthread_rwlock_clockrdlock() >>> pthread_rwlock_clockwrlock() >>> sem_clockwait() >>> >>> I've noticed some oddities, and at least a couple of bugs. >>> >>> First off, I just note that there's a surprisingly wide variation in >>> the low-level futex calls being used by these APIs when implementing >>> CLOCK_REALTIME support: >>> >>> pthread_rwlock_clockrdlock() >>> pthread_rwlock_clockwrlock() >>> sem_clockwait() >>> pthread_cond_clockwait() >>> futex(addr, >>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, >>> {abstimespec}, FUTEX_BITSET_MATCH_ANY) >>> (This implementation seems to be okay) >>> >>> pthread_clockjoin_np() >>> futex(addr, FUTEX_WAIT, 48711, {reltimespec}) >>> (This is buggy; see below.) >>> >>> pthread_mutex_clocklock() >>> futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec}) >>> (There's bugs and strangeness here; see below.) >> >> Yes, I found it very confusing when I started adding the new >> pthread*clock*() functions, and it still takes me a while to find the right >> functions when I look now. I believe that Adhemerval was talking about >> simplifying some of this. >> >>> === Bugs === >>> >>> pthread_clockjoin_np(): >>> As already recognized in another mail thread [1], this API accepts any >>> kind of clockid, even though it doesn't support most of them. >> >> Well, it sort of does support them at least as well as many other >> implementations of such functions do - it just calculates a relative >> timeout using the supplied lock and then uses that. But, ... >> >>> A further bug is that even if CLOCK_REALTIME is specified, >>> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock. >>> (Currently it does this for *all* clockid values.) The problem here is >>> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock >>> by default. At the least, the FUTEX_CLOCK_REALTIME is required for >>> this case. Alternatively, an implementation using >>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four >>> functions listed above) might be appropriate. >> >> ...this is one downside of that. That bug was inherited from the >> existing pthread_clock_timedjoin_np implementation. > Indeed, I am working on refactoring the futex internal usage to fix this issue. Thinking twice, I see that using FUTEX_WAIT_BITSET without any additional clock adjustments should be better than calling a clock_gettime plus FUTEX_WAIT. > Oh -- that's pretty sad. I hadn't considered the possibility that > the (longstanding) "timed" functions might have the same bug. > >> I was planning to write a patch to just limit the supported clocks, but >> I'll have a go at fixing the bug you describe properly instead first which >> will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway. I am working on this as well. >> >>> === >>> >>> pthread_mutex_clocklock(): >>> First of all, there's a small oddity. Suppose we specify the clockid >>> as CLOCK_REALTIME, and then while the call is blocked, we set the >>> clock realtime backwards. Then, there will be further futex calls to >>> handle the modification to the clock (and possibly multiple futex >>> calls if the realtime clock is adjusted repeatedly): >>> >>> futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1}) >>> futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2}) >>> ... >>> >>> Then there seems to be a bug. If we specify the clockid as >>> CLOCK_REALTIME, and while the call is blocked we set the realtime >>> clock forwards, then the blocking interval of the call is *not* >>> adjusted (shortened), when of course it should be. >> >> This is because __lll_clocklock_wait ends up doing a relative wait rather >> than an absolute one so it suffers from the same problem as >> pthread_clockjoin_np. It is another indication that it would be better to use FUTEX_WAIT_BITSET instead. >> >>> === >>> >>> I've attached a couple of small test programs at the end of this mail. >> >> Thanks for looking at this in detail. >> >> AFAIK, all of these bugs also affected the corresponding existing >> pthread*timed*() functions. When I added the new pthread*clock*() functions >> I was trying to keep my changes to the existing code as small as possible. >> (I started out trying to "scratch the itch" of libstdc++ >> std::condition_variable::wait_for misbehaving[2] when the system clock was >> warped in 2015 and all of this ballooned from that.) Now that the functions >> are in, I think there's definitely scope for improving the implementation >> and I will try to do so as time and confidence allows - the implementation >> of __pthread_mutex_clocklock_common scares me greatly! > > Yeah, a lot of glibc code is not so easy to follow... Thank you for > taking a look. The futex code in indeed convoluted, it was initially coded all at lowlevellock.h. Then it was moved out to lowlevellock-futex.h with the NaCL port (which required an override of the futex call to implement the NaCL libcalls). Later, the futex-internal.h was added that duplicated some lowlevellock-futex.h call with inline function plus some error checking (as libstdc++ does). So currently we have the nptl pthread code using both interfaces, which is confusing and the duplicate the logic. The patchset I am working makes the NPTL call to use only futex-internal.h, remove some non required function from it, and simplify the functions required on futex-internal.c. The idea is lowlevellock-futex.h would be used only for lowlevellock.h and futex-internal.h. I am thinking whether it would be useful to keep with lowlevellock-futex.h, it just a thin wrapper over futex syscall with a *lot* of unused macros and without proper y2038 support (as futex-internal.h does).