On Mon 21-08-23 20:54:58, Yosry Ahmed wrote: > Unified flushing allows for great concurrency for paths that attempt to > flush the stats, at the expense of potential staleness and a single > flusher paying the extra cost of flushing the full tree. > > This tradeoff makes sense for in-kernel flushers that may observe high > concurrency (e.g. reclaim, refault). For userspace readers, stale stats > may be unexpected and problematic, especially when such stats are used > for critical paths such as userspace OOM handling. Additionally, a > userspace reader will occasionally pay the cost of flushing the entire > hierarchy, which also causes problems in some cases [1]. > > Opt userspace reads out of unified flushing. This makes the cost of > reading the stats more predictable (proportional to the size of the > subtree), as well as the freshness of the stats. Since userspace readers > are not expected to have similar concurrency to in-kernel flushers, > serializing them among themselves and among in-kernel flushers should be > okay. > > This was tested on a machine with 256 cpus by running a synthetic test > The script that creates 50 top-level cgroups, each with 5 children (250 > leaf cgroups). Each leaf cgroup has 10 processes running that allocate > memory beyond the cgroup limit, invoking reclaim (which is an in-kernel > unified flusher). Concurrently, one thread is spawned per-cgroup to read > the stats every second (including root, top-level, and leaf cgroups -- > so total 251 threads). No regressions were observed in the total running > time; which means that non-unified userspace readers are not slowing > down in-kernel unified flushers: I have to admit I am rather confused by cgroup_rstat_flush (and cgroup_rstat_flush_locked). The former says it can block but the later doesn't ever block and even if it drops the cgroup_rstat_lock it merely cond_rescheds or busy loops. How much of a contention and yielding can you see with this patch? What is the worst case? How bad a random user can make the situation by going crazy and trying to flush from many different contexts? -- Michal Hocko SUSE Labs