On Sat, Oct 5, 2019 at 10:35 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 04 Oct 2019 16:09:22 +0300 Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote: > > > This is very slow operation. There is no reason to do it again if somebody > > else already drained all per-cpu vectors while we waited for lock. > > > > Piggyback on drain started and finished while we waited for lock: > > all pages pended at the time of our enter were drained from vectors. > > > > Callers like POSIX_FADV_DONTNEED retry their operations once after > > draining per-cpu vectors when pages have unexpected references. > > > > ... > > > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -708,9 +708,10 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) > > */ > > void lru_add_drain_all(void) > > { > > + static seqcount_t seqcount = SEQCNT_ZERO(seqcount); > > static DEFINE_MUTEX(lock); > > static struct cpumask has_work; > > - int cpu; > > + int cpu, seq; > > > > /* > > * Make sure nobody triggers this path before mm_percpu_wq is fully > > @@ -719,7 +720,19 @@ void lru_add_drain_all(void) > > if (WARN_ON(!mm_percpu_wq)) > > return; > > > > + seq = raw_read_seqcount_latch(&seqcount); > > + > > mutex_lock(&lock); > > + > > + /* > > + * Piggyback on drain started and finished while we waited for lock: > > + * all pages pended at the time of our enter were drained from vectors. > > + */ > > + if (__read_seqcount_retry(&seqcount, seq)) > > + goto done; > > + > > + raw_write_seqcount_latch(&seqcount); > > + > > cpumask_clear(&has_work); > > > > for_each_online_cpu(cpu) { > > @@ -740,6 +753,7 @@ void lru_add_drain_all(void) > > for_each_cpu(cpu, &has_work) > > flush_work(&per_cpu(lru_add_drain_work, cpu)); > > > > +done: > > mutex_unlock(&lock); > > } > > I'm not sure this works as intended. > > Suppose CPU #30 is presently executing the for_each_online_cpu() loop > and has reached CPU #15's per-cpu data. > > Now CPU #2 comes along, adds some pages to its per-cpu vectors then > calls lru_add_drain_all(). AFAICT the code will assume that CPU #30 > has flushed out all of the pages which CPU #2 just added, but that > isn't the case. > > Moving the raw_write_seqcount_latch() to the point where all processing > has completed might fix? > > No, raw_write_seqcount_latch() should be exactly before draining. Here seqcount works as generation of pages that could be in vectors. And all steps are serialized by mutex: only after taking lock we could be sure that all previous generations are gone. Here CPU #2 will see same generation at entry and after taking lock. So it will drain own pages.