On Tue, 19 May 2020 23:45:24 +0200 "Ahmed S. Darwish" wrote: > > Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls") > implemented an optimization mechanism to exit the to-be-started LRU > drain operation (name it A) if another drain operation *started and > finished* while (A) was blocked on the LRU draining mutex. > > This was done through a seqcount latch, which is an abuse of its > semantics: > > 1. Seqcount latching should be used for the purpose of switching > between two storage places with sequence protection to allow > interruptible, preemptible writer sections. The optimization > mechanism has absolutely nothing to do with that. > > 2. The used raw_write_seqcount_latch() has two smp write memory > barriers to always insure one consistent storage place out of the > two storage places available. This extra smp_wmb() is redundant for > the optimization use case. > > Beside the API abuse, the semantics of a latch sequence counter was > force fitted into the optimization. What was actually meant is to track > generations of LRU draining operations, where "current lru draining > generation =3D x" implies that all generations 0 < n <=3D x are already > *scheduled* for draining. > > Remove the conceptually-inappropriate seqcount latch usage and manually > implement the optimization using a counter and SMP memory barriers. > > Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@xxxxxxxxxxxxxx > Signed-off-by: Ahmed S. Darwish <a.darwish@xxxxxxxxxxxxx> > --- > mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index bf9a79fed62d..d6910eeed43d 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) > */ > void lru_add_drain_all(void) > { > - static seqcount_t seqcount =3D SEQCNT_ZERO(seqcount); > - static DEFINE_MUTEX(lock); > + /* > + * lru_drain_gen - Current generation of pages that could be in vectors > + * > + * (A) Definition: lru_drain_gen =3D x implies that all generations > + * 0 < n <=3D x are already scheduled for draining. > + * > + * This is an optimization for the highly-contended use case where a > + * user space workload keeps constantly generating a flow of pages > + * for each CPU. > + */ > + static unsigned int lru_drain_gen; > static struct cpumask has_work; > - int cpu, seq; > + static DEFINE_MUTEX(lock); > + int cpu, this_gen; > =20 > /* > * Make sure nobody triggers this path before mm_percpu_wq is fully > @@ -725,21 +735,48 @@ void lru_add_drain_all(void) > if (WARN_ON(!mm_percpu_wq)) > return; > =20 > - seq =3D raw_read_seqcount_latch(&seqcount); > + /* > + * (B) Cache the LRU draining generation number > + * > + * smp_rmb() ensures that the counter is loaded before the mutex is > + * taken. It pairs with the smp_wmb() inside the mutex critical section > + * at (D). > + */ > + this_gen =3D READ_ONCE(lru_drain_gen); > + smp_rmb(); > =20 > mutex_lock(&lock); > =20 > /* > - * Piggyback on drain started and finished while we waited for lock: > - * all pages pended at the time of our enter were drained from vectors. > + * (C) Exit the draining operation if a newer generation, from another > + * lru_add_drain_all(), was already scheduled for draining. Check (A). > */ > - if (__read_seqcount_retry(&seqcount, seq)) > + if (unlikely(this_gen !=3D lru_drain_gen)) > goto done; > =20 > - raw_write_seqcount_latch(&seqcount); > + /* > + * (D) Increment generation number > + * > + * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical > + * section. > + * > + * This pairing must be done here, before the for_each_online_cpu loop > + * below which drains the page vectors. > + * > + * Let x, y, and z represent some system CPU numbers, where x < y < z. > + * Assume CPU #z is is in the middle of the for_each_online_cpu loop > + * below and has already reached CPU #y's per-cpu data. CPU #x comes > + * along, adds some pages to its per-cpu vectors, then calls > + * lru_add_drain_all(). > + * > + * If the paired smp_wmb() below is done at any later step, e.g. after > + * the loop, CPU #x will just exit at (C) and miss flushing out all of > + * its added pages. > + */ > + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); > + smp_wmb(); Writer is inside mutex. > =20 > cpumask_clear(&has_work); > - > for_each_online_cpu(cpu) { > struct work_struct *work =3D &per_cpu(lru_add_drain_work, cpu); > =20 > @@ -766,7 +803,7 @@ void lru_add_drain_all(void) > { > lru_add_drain(); > } > -#endif > +#endif /* CONFIG_SMP */ > =20 > /** > * release_pages - batched put_page() > --=20 > 2.20.1 The tiny diff, inspired by this work, is trying to trylock mutex by swicthing the sequence counter's readers and writer across the mutex boundary. --- a/mm/swap.c +++ b/mm/swap.c @@ -714,10 +714,12 @@ static void lru_add_drain_per_cpu(struct */ void lru_add_drain_all(void) { - static seqcount_t seqcount = SEQCNT_ZERO(seqcount); + static unsigned int lru_drain_gen; static DEFINE_MUTEX(lock); static struct cpumask has_work; - int cpu, seq; + int cpu; + int loop = 0; + unsigned int seq; /* * Make sure nobody triggers this path before mm_percpu_wq is fully @@ -726,18 +728,12 @@ void lru_add_drain_all(void) if (WARN_ON(!mm_percpu_wq)) return; - seq = raw_read_seqcount_latch(&seqcount); + seq = lru_drain_gen++; + smp_mb(); - mutex_lock(&lock); - - /* - * Piggyback on drain started and finished while we waited for lock: - * all pages pended at the time of our enter were drained from vectors. - */ - if (__read_seqcount_retry(&seqcount, seq)) - goto done; - - raw_write_seqcount_latch(&seqcount); + if (!mutex_trylock(&lock)) + return; +more_work: cpumask_clear(&has_work); @@ -759,7 +755,11 @@ void lru_add_drain_all(void) for_each_cpu(cpu, &has_work) flush_work(&per_cpu(lru_add_drain_work, cpu)); -done: + smp_mb(); + if (++seq != lru_drain_gen && loop++ < 128) + goto more_work; + + smp_mb(); mutex_unlock(&lock); } #else --