On 06/07/23 00:39, Peter Zijlstra wrote: > On Wed, Jul 05, 2023 at 07:12:53PM +0100, Valentin Schneider wrote: > >> Note: A previous approach by PeterZ [1] used an extra bit in >> context_tracking.state to flag the presence of deferred callbacks to >> execute, and the actual callbacks were stored in a separate atomic >> variable. >> >> This meant that the atomic read of context_tracking.state was sufficient to >> determine whether there are any deferred callbacks to execute. >> Unfortunately, it presents a race window. Consider the work setting >> function as: >> >> preempt_disable(); >> seq = atomic_read(&ct->seq); >> if (__context_tracking_seq_in_user(seq)) { >> /* ctrl-dep */ >> atomic_or(work, &ct->work); >> ret = atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK); >> } >> preempt_enable(); >> >> return ret; >> >> Then the following can happen: >> >> CPUx CPUy >> CT_SEQ_WORK \in context_tracking.state >> atomic_or(WORK_N, &ct->work); >> ct_kernel_enter() >> ct_state_inc(); >> atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK); >> >> The cmpxchg() would fail, ultimately causing an IPI for WORK_N to be >> sent. Unfortunately, the work bit would remain set, and it can't be sanely >> cleared in case another CPU set it concurrently - this would ultimately >> lead to a double execution of the callback, one as a deferred callback and >> one in the IPI. As not all IPI callbacks are idempotent, this is >> undesirable. > > So adding another atomic is arguably worse. > > The thing is, if the NOHZ_FULL CPU is actually doing context transitions > (SYSCALLs etc..) then everything is fundamentally racy, there is no > winning that game, we could find the remote CPU is in-kernel, send an > IPI, the remote CPU does return-to-user and receives the IPI. > > And then the USER is upset... because he got an IPI. > Yeah, that part is inevitably racy. The thing I was especially worried about was the potential double executions (once in IPI, again in deferred work). It's not /too/ bad as the only two deferred callbacks I'm introducing here are costly-but-stateless, but IMO is a bad foundation. But it seems like we can reuse the existing atomic and squeeze some bits in there, so let's see how that goes :-)