On Tue, Oct 27, 2020 at 8:05 AM Michael Kerrisk (man-pages) <mtk.manpages@xxxxxxxxx> wrote: > On 10/12/20 4:52 PM, Jann Horn wrote: > > On Mon, Oct 12, 2020 at 1:49 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > >> Since 34e55232e59f7b19050267a05ff1226e5cd122a5 (introduced back in > >> v2.6.34), Linux uses per-thread RSS counters to reduce cache contention on > >> the per-mm counters. With a 4K page size, that means that you can end up > >> with the counters off by up to 252KiB per thread. > > > > Actually, as Mark Mossberg pointed out to me off-thread, the counters > > can actually be off by many times more... > > So, does your patch to proc.5 need tweaking, or can I just apply as is? The "(up to 63 pages per thread)" would have to go, the rest should be correct. But as Michal said, if someone volunteers to get rid of this optimization, maybe we don't need the documentation after all? But that would probably require actually doing some careful heavily-multithreaded benchmarking on a big machine with a few dozen cores, or something like that, so that we know whether this optimization actually is unimportant enough that we can just get rid of it...