On Mon, Feb 27, 2023 at 11:19 AM Josh Hunt <johunt@xxxxxxxxxx> wrote: > > > > On 2/27/23 9:49 AM, Suren Baghdasaryan wrote: > > On Mon, Feb 27, 2023 at 5:34 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > >> > >> On Fri 24-02-23 13:07:57, Suren Baghdasaryan wrote: > >>> On Fri, Feb 24, 2023 at 4:47 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > >>>> > >>>> On Tue 14-02-23 11:34:30, Suren Baghdasaryan wrote: > >>>> [...] > >>>>> Your suggestion to have this limit configurable sounds like obvious > >>>>> solution. I would like to get some opinions from other maintainers. > >>>>> Johannes, WDYT? CC'ing Michal to chime in as well since this is mostly > >>>>> related to memory stalls. > >>>> > >>>> I do not think that making this configurable helps much. Many users will > >>>> be bound to distribution config and also it would be hard to experiment > >>>> with a recompile cycle every time. This seems just too impractical. > >>>> > >>>> Is there any reason why we shouldn't allow any timeout? Shorter > >>>> timeouts could be restricted to a priviledged context to avoid an easy > >>>> way to swamp system by too frequent polling. > >>> > >>> Hmm, ok. Maybe then we just ensure that only privileged users can set > >>> triggers and remove the min limit (use a >0 check)? > >> > >> This could break existing userspace which is not privileged. I would > >> just go with CAP_SYS_NICE or similar with small (sub min) timeouts. > > > > Yeah, that's what I meant. /proc/pressure/* files already check for > > CAP_SYS_RESOURCE > > (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c*L1440__;Iw!!GjvTz_vk!WtI61poYlZk9kg5P1sX19RdYnUNGvBJRjnOpu8hL6IOZ_NKhuw2qZ_tAdNRwzZoQVlO4jEObYN6x$ ) > > but per-cgroup pressure files do not have this check. I think the > > original patch which added this check > > (https://urldefense.com/v3/__https://lore.kernel.org/all/20210402025833.27599-1-johunt@xxxxxxxxxx/__;!!GjvTz_vk!WtI61poYlZk9kg5P1sX19RdYnUNGvBJRjnOpu8hL6IOZ_NKhuw2qZ_tAdNRwzZoQVlO4jAVqIVDv$ ) > > missed the cgroup ones. This should be easy to add but I wonder if > > that was left that way intentionally. > > > > CC'ing the author. Josh, Johannes is that inconsistency between system > > pressure files and cgroup-specific ones intentional? Can we change > > them all to check for CAP_SYS_RESOURCE? > > No, this was just an oversight in the original patch at least from my > end, and did not come up during code review. Fine with me to change them > all to use CAP_SYS_RESOURCE. Thanks for the confirmation! Will get this fixed. > > Josh > > > > >> > >>>> Btw. it seems that there is is only a limit on a single trigger per fd > >>>> but no limits per user so it doesn't sound too hard to end up with too > >>>> much polling even with a larger timeouts. To me it seems like we need to > >>>> contain the polling thread to be bound by the cpu controller. > >>> > >>> Hmm. We have one "psimon" thread per cgroup (+1 system-level one) and > >>> poll_min_period for each thread is chosen as the min() of polling > >>> periods between triggers created in that group. So, a bad trigger that > >>> causes overly aggressive polling and polling thread being throttled, > >>> might affect other triggers in that cgroup. > >> > >> Yes, and why that would be a problem? > > > > If unprivileged processes are allowed to add new triggers then a > > malicious process can add a bad trigger and affect other legit > > processes. That sounds like a problem to me. > > Thanks, > > Suren. > > > >> -- > >> Michal Hocko > >> SUSE Labs