Re: Resource limits getting enforced only for processes in user's terminal not for su [user] from root's terminal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Create the cgroups *through systemd*, by creating .slice units for that purpose.

You can either create individual slices for each user, or you can enable Delegate= on a slice and then systemd will allow you to manage your own sub-cgroups inside.

On Fri, May 5, 2023 at 10:16 AM jaimin bhaduri <jaimin@xxxxxxxxxx> wrote:
I created a cgroup named mycgroup using 'mkdir /sys/fs/cgroup/mycgroup'.
'ls /sys/fs/cgroup/mycgroup' shows only memory and pid files. The io and cpu files were missing.

They are visible after I execute 'echo +cpu +io > /sys/fs/cgroup/cgroup.subtree_control'.

But 'systemctl daemon-reload' again deletes the cpu and io files.
Executing 'echo +cpu +io > /sys/fs/cgroup/cgroup.subtree_control' again brings the files back but the values of cpu.max and io.max files are now reset to default.

This happens to all the cgroups I create.
How do I enable cpu, io, memory, pids for the entire cgroups directory so that daemon reload or any other event does not delete those files for any of my created cgroup?

On Tue, May 2, 2023 at 12:54 PM jaimin bhaduri <jaimin@xxxxxxxxxx> wrote:
Ok I am understanding.

Using php, I created cgroups for every user with their username in /sys/fs/cgroup and set values in their cpu.max, memory.high, memory.high, pids.max, etc.
I made the below service file where I am moving pids of users to their cgroups. For example, pids of user5 will be appended to /sys/fs/cgroup/user5/cgroup.procs.
I am doing this for all users in loop after every 5 seconds as per the below configuration.

Content of /etc/systemd/system/cgroups.service:
[Unit]
Description=Move processes of user to cgroup

[Service]
Type=simple
User=root
ExecStart=/bin/bash -c 'while true; do
pgrep -u user1 | grep -vxFf /sys/fs/cgroup/user1/cgroup.procs | xargs -I{} sh -c "echo {} >> /sys/fs/cgroup/user1/cgroup.procs";
pgrep -u user2 | grep -vxFf /sys/fs/cgroup/user2/cgroup.procs | xargs -I{} sh -c "echo {} >> /sys/fs/cgroup/user2/cgroup.procs";
pgrep -u user3 | grep -vxFf /sys/fs/cgroup/user3/cgroup.procs | xargs -I{} sh -c "echo {} >> /sys/fs/cgroup/user3/cgroup.procs";
pgrep -u user4 | grep -vxFf /sys/fs/cgroup/user4/cgroup.procs | xargs -I{} sh -c "echo {} >> /sys/fs/cgroup/user4/cgroup.procs";
pgrep -u user5 | grep -vxFf /sys/fs/cgroup/user5/cgroup.procs | xargs -I{} sh -c "echo {} >> /sys/fs/cgroup/user5/cgroup.procs";
sleep 5; done'

[Install]
WantedBy=multi-user.target


This solution is working. But is this a good way to enforce resource limits on users? There can be more than 100 users also in some cases.



On Tue, Apr 25, 2023 at 9:33 AM Mantas Mikulėnas <grawity@xxxxxxxxx> wrote:


On Tue, Apr 25, 2023, 06:44 jaimin bhaduri <jaimin@xxxxxxxxxx> wrote:
/etc/systemd/system/user-1000.slice.d/override.conf:
[Unit]
Description=User Slice for UID 1000

[Slice]
CPUAccounting=1
MemoryAccounting=1
IOAccounting=1
TasksAccounting=1
CPUQuota=55%
MemoryMax=
MemoryHigh=1G
IOReadBandwidthMax=
IOWriteBandwidthMax=
IOReadIOPSMax=
IOWriteIOPSMax=
TasksMax=100

[Install]
WantedBy=multi-user.target

/etc/system/user/aa.service:
[Unit]
Description=Resource limits for user aa

[Service]
Slice=user-1000.slice
Environment=USER_UID=1000
User=%i
WorkingDirectory=%h
Type=simple
ExecStart=/bin/bash -c 'echo "User %EUID %USER_UID" && sudo -u \#$USER_UID $SHELL'
Restart=always
RestartSec=10

[Install]
WantedBy=default.target


I made the above mentioned override.conf(slice file) and aa.service file for the user named 'aa'.
Then I executed 'systemctl --user enable aa.service', 'systemctl --user daemon-reload' and 'systemctl daemon-reload'.
From user's terminal I executed 'stress -c 1'. In the root terminal, I saw the cpulimit did not exceed 55% using 'top' command.
But from root's terminal doing su aa, the cpu usage was 100%.
What mistake am I doing? Is there some syntax or coding error in my service file?

Doing `su aa` doesn't start aa.service! I don't know where you got the idea that it would. Users aren't services.

There may be cronjobs of that user which may get executed at night 12 am.

Cron calls pam_systemd, so it should be fine.

Or there may be scheduled backups of that user which may run every month/week at some particular time using php script.

Why is *that* not a cronjob, or even a service?

I just want the user's processes to follow the resource limits that are set in the slice file no matter how and where they start from or no matter if that user is logged in or not.

There is no nice way to achieve this. If a process isn't in the cgroup then it just isn't in the cgroup – something has to *deliberately* move it into that cgroup for its limits to apply.

The kernel has no such functionality built in, as far as I know. Processes deliberately stay in the cgroup they were spawned in, so that they couldn't *escape* limits.

Maybe check if there is some external daemon (cgmanager, maybe?) that would scan all newly created processes and would move them to the desired cgroup as quick as it can.

I am new to this. Please some help.

On Mon, Apr 24, 2023 at 11:54 AM Mantas Mikulėnas <grawity@xxxxxxxxx> wrote:
On Mon, Apr 24, 2023 at 7:04 AM jaimin bhaduri <jaimin@xxxxxxxxxx> wrote:
Cgroups v2 is enabled in almalinux 9.1 with 5.14.0-70.22.1.el9_0.x86_64 kernel and systemd 250 (250-12.el9_1.3).

Content of /etc/systemd/system/user-1002.slice.d/override.conf:
[Unit]
Description=User Slice for UID 1002

[Slice]
CPUAccounting=1
MemoryAccounting=1
IOAccounting=1
TasksAccounting=1
CPUQuota=70%
MemoryMax=1G
MemoryHigh=1G
IOReadBandwidthMax=/ 1G
IOWriteBandwidthMax=/ 1G
IOReadIOPSMax=/ 1000
IOWriteIOPSMax=/ 1000
TasksMax=200

[Install]
WantedBy=multi-user.target


I execute systemctl daemon-reload after saving the slice file.
Every value is getting enforced for the user when I test them by running some commands from the user's terminal.
But they dont work after I run the same commands from the root's terminal after doing su to that user.
They also dont work when a user's process is started from a php script using putenv('user_uid');.
How do I make them work for all the user's processes no matter how they start?

Using cgroup-based limits means that something needs to actually *move* the process into the appropriate cgroup. (They are not uid-based limits!)

As php-fpm does not support cgroup management on its own, you might need to run multiple instances of php-fpm@.service (not just multiple pools in the same instance), each instance specifying "Slice=user-%i.slice" similar to how user@.service does it.

For `su`, you would need to configure its PAM stack to invoke pam_systemd, but this is usually *deliberately* not done, as doing so would cause other issues, especially for scripts that use `su` for non-interactive purposes. (Besides that, systemd-logind does not allow creating a new session from within another one, so the only time `su` would be allowed to do this is exactly the time when it would be undesirable...)

Instead, `machinectl shell foo@` or `systemd-run --user -M foo@.host --pty ...` could be used if you need to manually run something as another user (but as soon you need to do it twice, you should just make a .service with Slice=, or even a --user service).

--
Mantas Mikulėnas


--
Mantas Mikulėnas

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux