Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Wed, 02 Dec 2020 16:57:31 +0100

On Mon, Nov 30 2020 at 09:31, Christoph Lameter wrote:
> On Fri, 27 Nov 2020, Marcelo Tosatti wrote:
>
>> Decided to switch to prctl interface, and then it starts
>> to become similar to "task mode isolation" patchset API.
>
> Right I think that was a good approach.

prctl() is the right thing to do.

>> In addition to quiescing pending activities on the CPU, it would
>> also be useful to assign a per-task attribute (which is then assigned
>> to a per-CPU attribute), indicating whether that CPU is running
>> an isolated task or not.
>
> Sounds good but what would this do? Give a warning like the isolation
> patchset?

This all needs a lot more thought about the overall picture. We already
have too many knobs and ad hoc hooks which fiddle with isolation.

The current CPU isolation is a best effort approach and I agree that for
more strict isolation modes we need to be able to enforce that and hunt
down offenders and think about them one by one.

>> To be called before real time loop, one would have:

Can we please agree in the first place, that "real time" is absolutely
the wrong term here?

It's about running undisturbed CPU bound computations whatever nature
they are. It does not matter whether that loop does busy polling ala
DPDK, whether it runs a huge math computation on a data set or
whatever people come up with.

>> 	prctl(PR_SET_TASK_ISOLATION, ISOLATION_ENABLE) [1]
>> 	real time loop
>> 	prctl(PR_SET_TASK_ISOLATION, ISOLATION_DISABLE)
>>
>> (with the attribute also being cleared on task exit).
>>
>> The general description would be:
>>
>> "Set task isolated mode for a given task, returning an error
>> if the task is not pinned to a single CPU.

Plus returning an error if the task has no permissions to request
this. This should not be an unprivileged prctl ever.

>> In this mode, the kernel will avoid interruptions to isolated
>> CPUs when possible."
>>
>> Any objections against such an interface ?
>
> Maybe do both like in the isolation patchset?

We really want to define the scopes first. And here you go:

> Often code can tolerate a few interruptions (in some code branches
> regular syscalls may be needed) but one wants the thread to be
> as quiet as possible.

So you say some code can tolerate a few interrupts, then comes Alex and
says 'no disturbance' at all.

The point is that all of this shares the mechanisms to quiesce certain
parts of the kernel so this wants to build common infrastructure and the
prctl(ISOLATION, MODE) mode argument defines the scope of isolation
which the task asks for and the infrastructure decides whether it can be
granted and if so orchestrates the operation and provides a common
infrastructure for instrumentation, violation monitoring etc.

We really need to stop to look at particular workloads and defining
adhoc solutions tailored to their particular itch if we don't want to
end up with an uncoordinated and unmaintainable zoo of interfaces, hooks
and knobs.

Just looking at the problem at hand as an example. NOHZ already issues
quiet_vmstat(), but it does not cancel already scheduled work. Now
Marcelo wants a new mechanism which is supposed to cancel the work and
then Alex want's to prevent it from being rescheduled. If that's not
properly coordinated this goes down the drain very fast.

So can we please come up with a central place to handle this prctl()
with a future proof argument list so the various isolation needs can be
expressed as required?

That allows Marcelo to start tackling the vmstat side and Alex can
utilize that and build the other parts into it piece by piece.

Thanks,

        tglx