Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote: > > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) > > */ > > void lru_add_drain_all(void) > > { > Re-adding cut-out comment for context: /* * lru_drain_gen - Current generation of pages that could be in vectors * * (A) Definition: lru_drain_gen = x implies that all generations * 0 < n <= x are already scheduled for draining. * * This is an optimization for the highly-contended use case where a * user space workload keeps constantly generating a flow of pages * for each CPU. */ > > + static unsigned int lru_drain_gen; > > static struct cpumask has_work; > > + static DEFINE_MUTEX(lock); > > + int cpu, this_gen; > > > > /* > > * Make sure nobody triggers this path before mm_percpu_wq is fully > > @@ -725,21 +735,48 @@ void lru_add_drain_all(void) > > if (WARN_ON(!mm_percpu_wq)) > > return; > > > Re-adding cut-out comment for context: /* * (B) Cache the LRU draining generation number * * smp_rmb() ensures that the counter is loaded before the mutex is * taken. It pairs with the smp_wmb() inside the mutex critical section * at (D). */ > > + this_gen = READ_ONCE(lru_drain_gen); > > + smp_rmb(); > > this_gen = smp_load_acquire(&lru_drain_gen); ACK. will do. > > > > mutex_lock(&lock); > > > > /* > > + * (C) Exit the draining operation if a newer generation, from another > > + * lru_add_drain_all(), was already scheduled for draining. Check (A). > > */ > > + if (unlikely(this_gen != lru_drain_gen)) > > goto done; > > > Re-adding cut-out comment for context: /* * (D) Increment generation number * * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical * section. * * This pairing must be done here, before the for_each_online_cpu loop * below which drains the page vectors. * * Let x, y, and z represent some system CPU numbers, where x < y < z. * Assume CPU #z is is in the middle of the for_each_online_cpu loop * below and has already reached CPU #y's per-cpu data. CPU #x comes * along, adds some pages to its per-cpu vectors, then calls * lru_add_drain_all(). * * If the paired smp_wmb() below is done at any later step, e.g. after * the loop, CPU #x will just exit at (C) and miss flushing out all of * its added pages. */ > > + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); > > + smp_wmb(); > > You can leave this smp_wmb() out and rely on the smp_mb() implied by > queue_work_on()'s test_and_set_bit(). > Won't this be too implicit? Isn't it possible that, over the years, queue_work_on() impementation changes and the test_and_set_bit()/smp_mb() gets removed? If that happens, this commit will get *silently* broken and the local CPU pages won't be drained. > > cpumask_clear(&has_work); > > - > > for_each_online_cpu(cpu) { > > struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); > > > > While you're here, do: > > s/cpumask_set_cpu/__&/ > ACK. Thanks, -- Ahmed S. Darwish Linutronix GmbH