On Thu, Oct 26, 2023 at 9:49 AM Luca Boccassi <bluca@xxxxxxxxxx> wrote: > > On Thu, 26 Oct 2023 at 17:41, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > 519fabc7aaba ("psi: remove 500ms min window size limitation for > > triggers") breaks unprivileged psi polling on cgroups. > > > > Historically, we had a privilege check for polling in the open() of a > > pressure file in /proc, but were erroneously missing it for the open() > > of cgroup pressure files. > > > > When unprivileged polling was introduced in d82caa273565 ("sched/psi: > > Allow unprivileged polling of N*2s period"), it needed to filter > > privileges depending on the exact polling parameters, and as such > > moved the CAP_SYS_RESOURCE check from the proc open() callback to > > psi_trigger_create(). Both the proc files as well as cgroup files go > > through this during write(). This implicitly added the missing check > > for privileges required for HT polling for cgroups. > > > > When 519fabc7aaba ("psi: remove 500ms min window size limitation for > > triggers") followed right after to remove further restrictions on the > > RT polling window, it incorrectly assumed the cgroup privilege check > > was still missing and added it to the cgroup open(), mirroring what we > > used to do for proc files in the past. > > > > As a result, unprivileged poll requests that would be supported now > > get rejected when opening the cgroup pressure file for writing. Ah, I see the problem. In our discussion https://lore.kernel.org/all/ZADj4YX4uftK%2FFrh@xxxxxxxxxxx/ we decided to have the check in open() to fail early but we never considered unprivileged processes which only poll and never create any triggers. Makes sense. > > > > Remove the cgroup open() check. psi_trigger_create() handles it. > > > > Fixes: 519fabc7aaba ("psi: remove 500ms min window size limitation for triggers") > > Cc: stable@xxxxxxxxxxxxxxx # 6.5+ > > Reported-by: Luca Boccassi <bluca@xxxxxxxxxx> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > Acked-by: Luca Boccassi <bluca@xxxxxxxxxx> Acked-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > Thank you very much for the quick fix - this was reported originally > on the systemd bug tracker by Daniel Black (I do not have an email > address): > > https://github.com/systemd/systemd/issues/29723 > > It is very important for systemd services to be able to do this > without capabilities, as using capabilities means in turn user > namespaces cannot be used (PrivateUsers=yes in systemd parlance). > > Kind regards, > Luca Boccassi