On Mon, Feb 27, 2023 at 5:34 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Fri 24-02-23 13:07:57, Suren Baghdasaryan wrote: > > On Fri, Feb 24, 2023 at 4:47 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Tue 14-02-23 11:34:30, Suren Baghdasaryan wrote: > > > [...] > > > > Your suggestion to have this limit configurable sounds like obvious > > > > solution. I would like to get some opinions from other maintainers. > > > > Johannes, WDYT? CC'ing Michal to chime in as well since this is mostly > > > > related to memory stalls. > > > > > > I do not think that making this configurable helps much. Many users will > > > be bound to distribution config and also it would be hard to experiment > > > with a recompile cycle every time. This seems just too impractical. > > > > > > Is there any reason why we shouldn't allow any timeout? Shorter > > > timeouts could be restricted to a priviledged context to avoid an easy > > > way to swamp system by too frequent polling. > > > > Hmm, ok. Maybe then we just ensure that only privileged users can set > > triggers and remove the min limit (use a >0 check)? > > This could break existing userspace which is not privileged. I would > just go with CAP_SYS_NICE or similar with small (sub min) timeouts. Yeah, that's what I meant. /proc/pressure/* files already check for CAP_SYS_RESOURCE (https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c#L1440) but per-cgroup pressure files do not have this check. I think the original patch which added this check (https://lore.kernel.org/all/20210402025833.27599-1-johunt@xxxxxxxxxx/) missed the cgroup ones. This should be easy to add but I wonder if that was left that way intentionally. CC'ing the author. Josh, Johannes is that inconsistency between system pressure files and cgroup-specific ones intentional? Can we change them all to check for CAP_SYS_RESOURCE? > > > > Btw. it seems that there is is only a limit on a single trigger per fd > > > but no limits per user so it doesn't sound too hard to end up with too > > > much polling even with a larger timeouts. To me it seems like we need to > > > contain the polling thread to be bound by the cpu controller. > > > > Hmm. We have one "psimon" thread per cgroup (+1 system-level one) and > > poll_min_period for each thread is chosen as the min() of polling > > periods between triggers created in that group. So, a bad trigger that > > causes overly aggressive polling and polling thread being throttled, > > might affect other triggers in that cgroup. > > Yes, and why that would be a problem? If unprivileged processes are allowed to add new triggers then a malicious process can add a bad trigger and affect other legit processes. That sounds like a problem to me. Thanks, Suren. > -- > Michal Hocko > SUSE Labs