Às 11:23 de 08/06/21, Peter Zijlstra escreveu: > On Tue, Jun 08, 2021 at 02:26:22PM +0200, Sebastian Andrzej Siewior wrote: >> On 2021-06-07 12:40:54 [-0300], André Almeida wrote: >>> >>> When I first read Thomas proposal for per table process, I thought that >>> the main goal there was to solve NUMA locality issues, not RT latency, >>> but I think you are right. However, re-reading the thread at [0], it >>> seems that the RT problems where not completely solved in that >>> interface, maybe the people involved with that patchset can help to shed >>> some light on it. >>> >>> Otherwise, this same proposal could be integrated in futex2, given that >>> we would only need to provide to userland some extra flags and add some >>> `if`s around the hash table code (in a very similar way the NUMA code >>> will be implemented in futex2). >> >> There are slides at [0] describing some attempts and the kernel tree [1] >> from that time. >> >> The process-table solves the problem to some degree that two random >> process don't collide on the same hash bucket. But as Peter Zijlstra >> pointed out back then two threads from the same task could collide on >> the same hash bucket (and with ASLR not always). So the collision is >> there but limited and this was not perfect. >> >> All the attempts with API extensions didn't go well because glibc did >> not want to change a bit. This starts with a mutex that has a static >> initializer which has to work (I don't remember why the first >> pthread_mutex_lock() could not fail with -ENOMEM but there was >> something) and ends with glibc's struct mutex which is full and has no >> room for additional data storage. >> >> The additional data in user's struct mutex + init would have the benefit >> that instead uaddr (which is hashed for the in-kernel lookup) a cookie >> could be used for the hash-less lookup (and NUMA pointer where memory >> should be stored). >> >> So. We couldn't change a thing back then so nothing did happen. We >> didn't want to create a new interface and a library implementing it plus >> all the functionality around it (like pthread_cond, phtread_barrier, …). >> Not to mention that if glibc continues to use the "old" locking >> internally then the application is still affected by the hash-collision >> locking (or the NUMA problem) should it block on the lock. > > There's more futex users than glibc, and some of them are really hurting > because of the NUMA issue. Oracle used to (I've no idea what they do or > do not do these days) use sysvsem because the futex hash table was a > massive bottleneck for them. > > And as Nick said, other vendors are having the same problems. Since we're talking about NUMA, which userspace communities would be able to provide feedback about the futex2() NUMA-aware feature, to check if this interface would help solving those issues? > > And if you don't extend the futex to store the nid you put the waiter in > (see all the problems above) you will have to do wakeups on all nodes, > which is both slower than it is today, and scales possibly even worse. > > The whole numa-aware qspinlock saga is in part because of futex. > > > That said; if we're going to do the whole futex-vector thing, we really > do need a new interface, because the futex multiplex monster is about to > crumble (see the fun wrt timeouts for example). > > And if we're going to do a new interface, we ought to make one that can > solve all these problems. Now, ideally glibc will bring forth some > opinions, but if they don't want to play, we'll go back to the good old > days of non-standard locking libraries.. we're halfway there already due > to glibc not wanting to break with POSIX were we know POSIX was just > dead wrong broken. > > See: https://github.com/dvhart/librtpi > >