On 28/07/22 11:39, Tejun Heo wrote: > Hello, Waiman. > > On Thu, Jul 28, 2022 at 05:04:19PM -0400, Waiman Long wrote: >> > So, the patch you proposed is making the code remember one special aspect of >> > user requested configuration - whether it configured it or not, and trying >> > to preserve that particular state as cpuset state changes. It addresses the >> > immediate problem but it is a very partial approach. Let's say a task wanna >> > be affined to one logical thread of each core and set its mask to 0x5555. >> > Now, let's say cpuset got enabled and enforced 0xff and affined the task to >> > 0xff. After a while, the cgroup got more cpus allocated and its cpuset now >> > has 0xfff. Ideally, what should happen is the task now having the effective >> > mask of 0x555. In practice, tho, it either would get 0xf55 or 0x55 depending >> > on which way we decide to misbehave. >> >> OK, I see what you want to accomplish. To fully address this issue, we will >> need to have a new cpumask variable in the the task structure which will be >> allocated if sched_setaffinity() is ever called. I can rework my patch to >> use this approach. > > Yeah, we'd need to track what user requested separately from the currently > effective cpumask. Let's make sure that the scheduler folks are on board > before committing to the idea tho. Peter, Ingo, what do you guys think? > FWIW on a runtime overhead side of things I think it'll be OK as that should be just an extra mask copy in sched_setaffinity() and a subset check / cpumask_and() in set_cpus_allowed_ptr(). The policy side is a bit less clear (when, if ever, do we clear the user-defined mask? Will it keep haunting us even after moving a task to a disjoint cpuset partition?). There's also if/how that new mask should be exposed, because attaching a task to a cpuset will now yield a not-necessarily-obvious affinity - e.g. in the thread affinity example above, if the initial affinity setting was done ages ago by some system tool, IMO the user needs a way to be able to expect/understand the result of 0x555 rather than 0xfff. While I'm saying this, I don't think anything exposes p->user_cpus_ptr, but then again that one is for "special" hardware... > Thanks. > > -- > tejun