On Mon, Nov 30 2020 at 09:31, Christoph Lameter wrote: > On Fri, 27 Nov 2020, Marcelo Tosatti wrote: > >> Decided to switch to prctl interface, and then it starts >> to become similar to "task mode isolation" patchset API. > > Right I think that was a good approach. prctl() is the right thing to do. >> In addition to quiescing pending activities on the CPU, it would >> also be useful to assign a per-task attribute (which is then assigned >> to a per-CPU attribute), indicating whether that CPU is running >> an isolated task or not. > > Sounds good but what would this do? Give a warning like the isolation > patchset? This all needs a lot more thought about the overall picture. We already have too many knobs and ad hoc hooks which fiddle with isolation. The current CPU isolation is a best effort approach and I agree that for more strict isolation modes we need to be able to enforce that and hunt down offenders and think about them one by one. >> To be called before real time loop, one would have: Can we please agree in the first place, that "real time" is absolutely the wrong term here? It's about running undisturbed CPU bound computations whatever nature they are. It does not matter whether that loop does busy polling ala DPDK, whether it runs a huge math computation on a data set or whatever people come up with. >> prctl(PR_SET_TASK_ISOLATION, ISOLATION_ENABLE) [1] >> real time loop >> prctl(PR_SET_TASK_ISOLATION, ISOLATION_DISABLE) >> >> (with the attribute also being cleared on task exit). >> >> The general description would be: >> >> "Set task isolated mode for a given task, returning an error >> if the task is not pinned to a single CPU. Plus returning an error if the task has no permissions to request this. This should not be an unprivileged prctl ever. >> In this mode, the kernel will avoid interruptions to isolated >> CPUs when possible." >> >> Any objections against such an interface ? > > Maybe do both like in the isolation patchset? We really want to define the scopes first. And here you go: > Often code can tolerate a few interruptions (in some code branches > regular syscalls may be needed) but one wants the thread to be > as quiet as possible. So you say some code can tolerate a few interrupts, then comes Alex and says 'no disturbance' at all. The point is that all of this shares the mechanisms to quiesce certain parts of the kernel so this wants to build common infrastructure and the prctl(ISOLATION, MODE) mode argument defines the scope of isolation which the task asks for and the infrastructure decides whether it can be granted and if so orchestrates the operation and provides a common infrastructure for instrumentation, violation monitoring etc. We really need to stop to look at particular workloads and defining adhoc solutions tailored to their particular itch if we don't want to end up with an uncoordinated and unmaintainable zoo of interfaces, hooks and knobs. Just looking at the problem at hand as an example. NOHZ already issues quiet_vmstat(), but it does not cancel already scheduled work. Now Marcelo wants a new mechanism which is supposed to cancel the work and then Alex want's to prevent it from being rescheduled. If that's not properly coordinated this goes down the drain very fast. So can we please come up with a central place to handle this prctl() with a future proof argument list so the various isolation needs can be expressed as required? That allows Marcelo to start tackling the vmstat side and Alex can utilize that and build the other parts into it piece by piece. Thanks, tglx