On 10/12/20 5:33 PM, Michal Hocko wrote:
On Mon 12-10-20 17:20:08, Jann Horn wrote:
On Mon, Oct 12, 2020 at 5:07 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> On Mon 12-10-20 13:49:40, Jann Horn wrote:
> > Since 34e55232e59f7b19050267a05ff1226e5cd122a5 (introduced back in
> > v2.6.34), Linux uses per-thread RSS counters to reduce cache contention on
> > the per-mm counters. With a 4K page size, that means that you can end up
> > with the counters off by up to 252KiB per thread.
>
> Do we actually have any strong case to keep this exception to the
> accounting?
I have no clue. The concept of "concurrently modified cache lines are
bad" seemed vaguely reasonable to me... but I have no idea how much
impact this actually has on massively multithreaded processes.
I do remember some discussion when imprecision turned out to be a real
problem (Android?).
Anyway, I have to say that 34e55232e59f ("mm: avoid false sharing of
mm_counter") sounds quite dubious to me and it begs for re-evaluation.
Agreed.
- false sharing? no, false sharing is when unrelated things share a cache line,
this is a real sharing of a counter, AFAICS. If the problem is really
exacerbated by false sharing of the counter with something else, then the fix is
to move the counter or something else to a different cache line.
- the evaluation showing of 4.5 cache misses per fault reduced to 4, I think 0.5
cache miss is negligible compared to a page fault
- "Anyway, the most contended object is mmap_sem if the number of threads
grows." - and surprise surprise, 10 years later this is still true :)
Btw. thanks for trying to document this weird behavior. This is
certainly useful but I am suspecting that dropping it might be even
better.