Re: [RFC PATCH 0/1] Add FUTEX_SPIN operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

Em 26/04/2024 07:26, Christian Brauner escreveu:
On Thu, Apr 25, 2024 at 05:43:31PM -0300, André Almeida wrote:
Hi,

In the last LPC, Mathieu Desnoyers and I presented[0] a proposal to extend the
rseq interface to be able to implement spin locks in userspace correctly. Thomas
Gleixner agreed that this is something that Linux could improve, but asked for
an alternative proposal first: a futex operation that allows to spin a user
lock inside the kernel. This patchset implements a prototype of this idea for
further discussion.

With FUTEX2_SPIN flag set during a futex_wait(), the futex value is expected to
be the PID of the lock owner. Then, the kernel gets the task_struct of the
corresponding PID, and checks if it's running. It spins until the futex
is awaken, the task is scheduled out or if a timeout happens.  If the lock owner
is scheduled out at any time, then the syscall follows the normal path of
sleeping as usual.

If the futex is awaken and we are spinning, we can return to userspace quickly,
avoid the scheduling out and in again to wake from a futex_wait(), thus
speeding up the wait operation.

I didn't manage to find a good mechanism to prevent race conditions between
setting *futex = PID in userspace and doing find_get_task_by_vpid(PID) in kernel
space, giving that there's enough room for the original PID owner exit and such
PID to be relocated to another unrelated task in the system. I didn't performed

One option would be to also allow pidfds. Starting with v6.9 they can be
used to reference individual threads.

So for the really fast case where you have multiple threads and you
somehow may really do care about the impact of the atomic_long_inc() on
pidfd_file->f_count during fdget() (for the single-threaded case the
increment is elided), callers can pass the TID. But in cases where the
inc and put aren't a performance sensitive, you can use pidfds.


Thank you very much for making the effort here, much appreciated :)

While I agree that pidfds would fix the PID race conditions, I will move this interface to support TIDs instead, as noted by Florian and Peter. With TID the race conditions are diminished I reckon?




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux