Re: [PATCH] psi: reduce min window size to 50ms

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Mon, 27 Feb 2023 09:49:59 -0800

On Mon, Feb 27, 2023 at 5:34 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Fri 24-02-23 13:07:57, Suren Baghdasaryan wrote:
> > On Fri, Feb 24, 2023 at 4:47 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Tue 14-02-23 11:34:30, Suren Baghdasaryan wrote:
> > > [...]
> > > > Your suggestion to have this limit configurable sounds like obvious
> > > > solution. I would like to get some opinions from other maintainers.
> > > > Johannes, WDYT? CC'ing Michal to chime in as well since this is mostly
> > > > related to memory stalls.
> > >
> > > I do not think that making this configurable helps much. Many users will
> > > be bound to distribution config and also it would be hard to experiment
> > > with a recompile cycle every time. This seems just too impractical.
> > >
> > > Is there any reason why we shouldn't allow any timeout? Shorter
> > > timeouts could be restricted to a priviledged context to avoid an easy
> > > way to swamp system by too frequent polling.
> >
> > Hmm, ok. Maybe then we just ensure that only privileged users can set
> > triggers and remove the min limit (use a >0 check)?
>
> This could break existing userspace which is not privileged. I would
> just go with CAP_SYS_NICE or similar with small (sub min) timeouts.

Yeah, that's what I meant. /proc/pressure/* files already check for
CAP_SYS_RESOURCE
(https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c#L1440)
but per-cgroup pressure files do not have this check. I think the
original patch which added this check
(https://lore.kernel.org/all/20210402025833.27599-1-johunt@xxxxxxxxxx/)
missed the cgroup ones. This should be easy to add but I wonder if
that was left that way intentionally.

CC'ing the author. Josh, Johannes is that inconsistency between system
pressure files and cgroup-specific ones intentional? Can we change
them all to check for CAP_SYS_RESOURCE?

>
> > > Btw. it seems that there is is only a limit on a single trigger per fd
> > > but no limits per user so it doesn't sound too hard to end up with too
> > > much polling even with a larger timeouts. To me it seems like we need to
> > > contain the polling thread to be bound by the cpu controller.
> >
> > Hmm. We have one "psimon" thread per cgroup (+1 system-level one) and
> > poll_min_period for each thread is chosen as the min() of polling
> > periods between triggers created in that group. So, a bad trigger that
> > causes overly aggressive polling and polling thread being throttled,
> > might affect other triggers in that cgroup.
>
> Yes, and why that would be a problem?

If unprivileged processes are allowed to add new triggers then a
malicious process can add a bad trigger and affect other legit
processes. That sounds like a problem to me.
Thanks,
Suren.

> --
> Michal Hocko
> SUSE Labs