On Mon, 24 Oct 2022 05:28:41 +0000 Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > Currently mm_struct maintains rss_stats which are updated on page fault > and the unmapping codepaths. For page fault codepath the updates are > cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. > The reason for caching is performance for multithreaded applications > otherwise the rss_stats updates may become hotspot for such > applications. > > However this optimization comes with the cost of error margin in the rss > stats. The rss_stats for applications with large number of threads can > be very skewed. At worst the error margin is (nr_threads * 64) and we > have a lot of applications with 100s of threads, so the error margin can > be very high. Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32. > > Recently we started seeing the unbounded errors for rss_stats for > specific applications which use TCP rx0cp. It seems like > vm_insert_pages() codepath does not sync rss_stats at all. > > This patch converts the rss_stats into percpu_counter to convert the > error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). Confused. The max error should be O(nr_cpus)? > However this conversion enable us to get the accurate stats for > situations where accuracy is more important than the cpu cost. Though > this patch does not make such tradeoffs. Curiousity. Can you expand on the final sentence here? > 8 files changed, 40 insertions(+), 107 deletions(-) There's that, too.