On Wed, May 01, 2024 at 08:44:36PM -0300, André Almeida wrote: > Hi Christian, > > Em 26/04/2024 07:26, Christian Brauner escreveu: > > On Thu, Apr 25, 2024 at 05:43:31PM -0300, André Almeida wrote: > > > Hi, > > > > > > In the last LPC, Mathieu Desnoyers and I presented[0] a proposal to extend the > > > rseq interface to be able to implement spin locks in userspace correctly. Thomas > > > Gleixner agreed that this is something that Linux could improve, but asked for > > > an alternative proposal first: a futex operation that allows to spin a user > > > lock inside the kernel. This patchset implements a prototype of this idea for > > > further discussion. > > > > > > With FUTEX2_SPIN flag set during a futex_wait(), the futex value is expected to > > > be the PID of the lock owner. Then, the kernel gets the task_struct of the > > > corresponding PID, and checks if it's running. It spins until the futex > > > is awaken, the task is scheduled out or if a timeout happens. If the lock owner > > > is scheduled out at any time, then the syscall follows the normal path of > > > sleeping as usual. > > > > > > If the futex is awaken and we are spinning, we can return to userspace quickly, > > > avoid the scheduling out and in again to wake from a futex_wait(), thus > > > speeding up the wait operation. > > > > > > I didn't manage to find a good mechanism to prevent race conditions between > > > setting *futex = PID in userspace and doing find_get_task_by_vpid(PID) in kernel > > > space, giving that there's enough room for the original PID owner exit and such > > > PID to be relocated to another unrelated task in the system. I didn't performed > > > > One option would be to also allow pidfds. Starting with v6.9 they can be > > used to reference individual threads. > > > > So for the really fast case where you have multiple threads and you > > somehow may really do care about the impact of the atomic_long_inc() on > > pidfd_file->f_count during fdget() (for the single-threaded case the > > increment is elided), callers can pass the TID. But in cases where the > > inc and put aren't a performance sensitive, you can use pidfds. > > > > Thank you very much for making the effort here, much appreciated :) > > While I agree that pidfds would fix the PID race conditions, I will move > this interface to support TIDs instead, as noted by Florian and Peter. With > TID the race conditions are diminished I reckon? Unless I'm missing something the question here is PID (as in TGID aka thread-group leader id gotten via getpid()) vs TID (thread specific id gotten via gettid()). You want the thread-specific id as you want to interact with the futex state of a specific thread not the thread-group leader. Aside from that TIDs are subject to the same race conditions that PIDs are. They are allocated from the same pool (see alloc_pid()).