Re: [EXT] Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics

Alex Belits <abelits@xxxxxxxxxxx> · Thu, 3 Dec 2020 22:47:02 +0000

On Mon, 2020-11-30 at 15:29 -0300, Marcelo Tosatti wrote:
> On Mon, Nov 30, 2020 at 03:18:58PM -0300, Marcelo Tosatti wrote:
> > On Sat, Nov 28, 2020 at 03:49:38AM +0000, Alex Belits wrote:
> 
> Hi Alex,
> 
> Say, couldnt a notification from the trace-latency infrastructure,
> notifying the admin that latency was exceeded due to interruptions
> 
> x us (backtrace of x) + y us (backtrace of y) + z us (backtrace of z)
> >= maxlatency us

I believe, for performance and readability reasons we may want to
replace backtrace with "cause" record that lists specific recognized
cause of entering kernel (page fault, interrupt) with event-specific
arguments that don't show up in backtrace, such as syscall, address,
interrupt number, IPI call, timer. And then, optionally, a backtrace.

It also should be taken into account that a backtrace is only useful if
it is taken at the right point. We already know exactly what the
backtrace is right after entering kernel, in task flags processing
loop, or right before the exit. To determine anything important we have
to do something (and possibly record a backtrace) in a specific
handler, syscall, etc., but there we mostly care about the last
function anyway.

> 
> With an application which continue to handle traffic, be 
> as functional as the signal? (then again, don't know exactly what
> you do in the signal handler...).

If "the admin" is actually a manager process using an interface such as
netlink, to collect this information from kernel. Isolated process
wouldn't be able to use this interface until it knows that it exited
isolation, because communication with kernel involves a syscall, so it
will still need something -- notification through shared memory from
the manager process, or possibly vdso, however touching vdso often may
affect performance.

There is one more piece of information that I want to record, remote
and cause. Remote cause is whatever was set by _another_ CPU before
calling this one, possibly with its backtrace. In that case backtrace
is useful because we already know that we are sending the IPI, however
what we need is to know, from where and how that was called.

> 
> Could also count "enter kernel <-> exit kernel" window as in 
> interruption such a scheme.

This is pretty much what the current patch does, conditionally on
isolated state set in per-CPU flags, now that entry and exit hooks are
in their places. If there will be tracing/logging done, timing can be
collected there, and causes determined in a specific handler for them.
We can add more flags for turning this mechanism on without full task
isolation.

-- 
Alex