On Wed, Aug 23, 2023 at 12:33 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 22-08-23 08:30:05, Yosry Ahmed wrote: > > On Tue, Aug 22, 2023 at 2:06 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Mon 21-08-23 20:54:58, Yosry Ahmed wrote: > [...] > > So to answer your question, I don't think a random user can really > > affect the system in a significant way by constantly flushing. In > > fact, in the test script (which I am now attaching, in case you're > > interested), there are hundreds of threads that are reading stats of > > different cgroups every 1s, and I don't see any negative effects on > > in-kernel flushers in this case (reclaimers). > > I suspect you have missed my point. I suspect you are right :) > Maybe I am just misunderstanding > the code but it seems to me that the lock dropping inside > cgroup_rstat_flush_locked effectivelly allows unbounded number of > contenders which is really dangerous when it is triggerable from the > userspace. The number of spinners at a moment is always bound by the > number CPUs but depending on timing many potential spinners might be > cond_rescheded and the worst time latency to complete can be really > high. Makes more sense? I think I understand better now. So basically because we might drop the lock and resched, there can be nr_cpus spinners + other spinners that are currently scheduled away, so these will need to wait to be scheduled and then start spinning on the lock. This may happen for one reader multiple times during its read, which is what can cause a high worst case latency. I hope I understood you correctly this time. Did I? So the logic to give up the lock and sleep was introduced by commit 0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") in 4.18. It has been possible for userspace to trigger this scenario by reading cpu.stat, which has been using rstat since then. On the memcg side, it was also possible to trigger this behavior between commit 2d146aa3aa84 ("mm: memcontrol: switch to rstat") and commit fd25a9e0e23b ("memcg: unify memcg stat flushing") (i.e between 5.13 and 5.16). I am not sure there has been any problems from this, but perhaps Tejun can answer this better than I can. The way I think about it is that random userspace reads will mostly be reading their subtrees, which is generally not very large (and can be limited), so every individual read should be cheap enough. Also, consequent flushes on overlapping subtrees will have very little to do as there won't be many pending updates, they should also be very cheap. So unless multiple jobs on the same machine are collectively trying to act maliciously (purposefully or not) and concurrently spawn multiple readers of different parts of the hierarchy (and maintain enough activity to generate stat updates to flush), I don't think it's a concern. I also imagine (but haven't checked) that there is some locking at some level that will throttle a malicious job that spawns multiple readers to the same memory.stat file. I hope this answers your question. > -- > Michal Hocko > SUSE Labs