Hi Sebastian, > -----Original Message----- > From: Frederic Weisbecker <frederic@xxxxxxxxxx> > Sent: Friday, November 12, 2021 5:37 PM > To: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Cc: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@xxxxxxxxxxx>; > cgroups@xxxxxxxxxxxxxxx; linux-rt-users@xxxxxxxxxxxxxxx; Schild, Henning (T > RDA IOT SES-DE) <henning.schild@xxxxxxxxxxx>; Kiszka, Jan (T RDA IOT) > <jan.kiszka@xxxxxxxxxxx>; Schmidt, Adriaan (T RDA IOT SES-DE) > <adriaan.schmidt@xxxxxxxxxxx>; Zefan Li <lizefan.x@xxxxxxxxxxxxx>; Tejun > Heo <tj@xxxxxxxxxx>; Johannes Weiner <hannes@xxxxxxxxxxx>; Waiman Long > <longman@xxxxxxxxxx> > Subject: Re: Questions about replacing isolcpus by cgroup-v2 > > On Fri, Nov 12, 2021 at 04:36:56PM +0100, Sebastian Andrzej Siewior wrote: > > On 2021-11-04 17:29:08 [+0000], Moessbauer, Felix wrote: > > > Dear subscribers, > > Hi, > > > > I Cced cgroups@vger since thus question fits there better. > > I Cced Frederic in case he has come clues regarding isolcpus and > > cgroups. > > > > > we are currently evaluating how to rework realtime tuning to use cgroup-v2 > cpusets instead of the isolcpus kernel parameter. > > > Our use-case are realtime applications with rt and non-rt threads. Hereby, > the non-rt thread might create additional non-rt threads: > > > > > > Example (RT CPU=1, 4 CPUs): > > > - Non-RT Thread (A) with default affinity 0xD (1101b) > > > - RT Thread (B) with Affinity 0x2 (0010b, via set_affinity) > > > > > > When using pure isolcpus and cgroup-v1, just setting isolcpus=1 perfectly > works: > > > Thread A gets affinity 0xD, Thread B gets 0x2 and additional threads get a > default affinity of 0xD. > > > By that, independent of the threads' priorities, we can ensure that nothing is > scheduled on our RT cpu (except from kernel threads, etc...). > > > > > > During this journey, we discovered the following: > > > > > > Using cgroup-v2 cpusets and isolcpus together seems to be incompatible: > > > When activating the cpuset controller on a cgroup (for the first time), all > default CPU affinities are reset. > > > By that, also the default affinity is set to 0xFFFF..., while with isolcpus we > expect it to be (0xFFFF - isolcpus). > > > This breaks the example from above, as now the non-RT thread can > > > also be scheduled on the RT CPU. > > That sounds buggy from the cpuset-v2 side (adding the maintainers in Cc). > > Also please have a look into "[PATCH v8 0/6] cgroup/cpuset: Add new cpuset > partition type & empty effecitve cpus": > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern > el.org%2Flkml%2F20211018143619.205065-1- > longman%40redhat.com%2F&data=04%7C01%7Cfelix.moessbauer%40sie > mens.com%7C1a74cbf4e3d140a9031808d9a5faad3c%7C38ae3bcd95794fd4add > ab42e1495d55a%7C1%7C0%7C637723318334809165%7CUnknown%7CTWFpb > GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M > n0%3D%7C1000&sdata=Jm3j2vDvOOtikU5ZpusupQ6d6koPII9oYZZhpDUkvY > c%3D&reserved=0 > > This stuff adds support for a new "isolated" partition type on cpuset/cgroup-v2 > which should behave just like isolcpus. I already tested the patch and reported back on the ML. However, it only covers load-balancing aspects, not isolcpu like default affinities. When setting cpusets.cpus.partition=isolated, you get similar behavior as with =root (cpus are removed from all other groups), but the schedulers load-balancing is disabled for this domain. For details, please have a look in the other thread. > > > > > > > When only using cgroup-v2, we can isolate our RT process by placing it in a > cgroup with CPUs=0,1 and remove CPU=1 from all other cgroups. > > > However, we do not know of a strategy to set a default affinity: > > > Given the example above, we have no way to ensure that newly created > threads are born with an affinity of just 0x2 (without changing the application). > > > > > > Finally, isolcpus itself is deprecated since kernel 5.4. > > > > Where is this the deprecation of isolcpus announced/ written? > > We tried to deprecate it but too many people are still using it. Better pick an > interface that allows you to change the isolated set at runtime like > cpuset.sched_load_balance on cpuset/cgroup-v1 or the above patchset on v2. > Currently, the only workaround we know of to get isolcpu semantics on systems where other tools like container runtimes or libvirt fiddle around with the cpuset controller, is to simply enforce the cgroups-v1. But maybe we are just running into the bug from above. Felix > Thanks.