On Thu, Mar 2, 2023 at 7:30 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Mar 01, 2023 at 12:48:38PM -0800, Suren Baghdasaryan wrote: > > On Wed, Mar 1, 2023 at 12:07 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > > On Wed, Mar 01, 2023 at 11:34:03AM -0800, Suren Baghdasaryan wrote: > > > > Current 500ms min window size for psi triggers limits polling interval > > > > to 50ms to prevent polling threads from using too much cpu bandwidth by > > > > polling too frequently. However the number of cgroups with triggers is > > > > unlimited, so this protection can be defeated by creating multiple > > > > cgroups with psi triggers (triggers in each cgroup are served by a single > > > > "psimon" kernel thread). > > > > Instead of limiting min polling period, which also limits the latency of > > > > psi events, it's better to limit psi trigger creation to authorized users > > > > only, like we do for system-wide psi triggers (/proc/pressure/* files can > > > > be written only by processes with CAP_SYS_RESOURCE capability). This also > > > > makes access rules for cgroup psi files consistent with system-wide ones. > > > > Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and > > > > remove the psi window min size limitation. > > > > > > > > Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@xxxxxxxxxxx> > > > > Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@xxxxxxxxxxx/ > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > --- > > > > kernel/cgroup/cgroup.c | 10 ++++++++++ > > > > kernel/sched/psi.c | 4 +--- > > > > 2 files changed, 11 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > > > > index 935e8121b21e..b600a6baaeca 100644 > > > > --- a/kernel/cgroup/cgroup.c > > > > +++ b/kernel/cgroup/cgroup.c > > > > @@ -3867,6 +3867,12 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of, > > > > return psi_trigger_poll(&ctx->psi.trigger, of->file, pt); > > > > } > > > > > > > > +static int cgroup_pressure_open(struct kernfs_open_file *of) > > > > +{ > > > > + return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ? > > > > + -EPERM : 0; > > > > +} > > > > > > I agree with the change, but it's a bit unfortunate that this check is > > > duplicated between system and cgroup. > > > > > > What do you think about psi_trigger_create() taking the file and > > > checking FMODE_WRITE and CAP_SYS_RESOURCE against file->f_cred? > > > > That's definitely doable and we don't even need to pass file to > > psi_trigger_create() since it's called only when we write to the file. > > However by moving the capability check into psi_trigger_create() we > > also postpone the check until write() instead of failing early in > > open(). I always assumed failing early is preferable but if > > consolidating the code here makes more sense then I can make the > > switch. Please let me know if you still prefer to move the check. > > Just for context, a person on our team is working on allowing > unprivileged polls with windows that are multiples of 2s, which can be > triggered from the regular aggregator threads. This should be useful > for container delegation, and also for the desktop monitor app usecase > that Chris Down brought up some time ago. At that point, everybody can > open the file for write, and permissions are checked against the > trigger parameters. > > So I don't think it's a big deal to check this particular permission > at write time. But if you prefer we can also merge your patch as-is > and do the refactor as part of the other series. Let's roll this check without additional changes and then consolidate the checking inside psi_trigger_create() in a separate patch. If anybody objects to the late permission check we will just revert that last change without affecting anything else. > > Your call. In either case, please feel free to add > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> Thanks! Will post the final patch with Ack's later today. Originally it was purely cgroup-related change but now it's more of a PSI change. Therefore Peter's tree will probably be the right place for it.