On 04/10/2019 16.39, Michal Hocko wrote:
On Fri 04-10-19 16:32:39, Konstantin Khlebnikov wrote:
On 04/10/2019 16.12, Michal Hocko wrote:
On Fri 04-10-19 16:09:22, Konstantin Khlebnikov wrote:
This is very slow operation. There is no reason to do it again if somebody
else already drained all per-cpu vectors while we waited for lock.
Piggyback on drain started and finished while we waited for lock:
all pages pended at the time of our enter were drained from vectors.
Callers like POSIX_FADV_DONTNEED retry their operations once after
draining per-cpu vectors when pages have unexpected references.
This describes why we need to wait for preexisted pages on the pvecs but
the changelog doesn't say anything about improvements this leads to.
In other words what kind of workloads benefit from it?
Right now POSIX_FADV_DONTNEED is top user because it have to freeze page
reference when removes it from cache. invalidate_bdev calls it for same reason.
Both are triggered from userspace, so it's easy to generate storm.
mlock/mlockall no longer calls lru_add_drain_all - I've seen here
serious slowdown on older kernel.
There are some less obvious paths in memory migration/CMA/offlining
which shouldn't be called frequently.
Can you back those claims by any numbers?
Well, worst case requires non-trivial workload because lru_add_drain_all
skips cpus where vectors are empty. Something must constantly generates
flow of pages at each cpu. Also cpus must be busy to make scheduling per-cpu
works slower. And machine must be big enough (64+ cpus in our case).
In our case that was massive series of mlock calls in map-reduce while other
tasks writes log (and generates flow of new pages in per-cpu vectors). Mlock
calls were serialized by mutex and accumulated latency up to 10 second and more.
Kernel does not call lru_add_drain_all on mlock paths since 4.15, but same scenario
could be triggered by fadvise(POSIX_FADV_DONTNEED) or any other remaining user.