RE: [PATCH v8 06/14] task_isolation: provide strict mode configurable signal

Gilad Ben Yossef <giladb@xxxxxxxxxx> · Sat, 24 Oct 2015 09:16:23 +0000

Hi Andy,

Thank for the feedback.

> From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
> Sent: Wednesday, October 21, 2015 9:53 PM
> To: Gilad Ben Yossef
> Cc: Chris Metcalf; Steven Rostedt; Ingo Molnar; Peter Zijlstra; Andrew
> Morton; Rik van Riel; Tejun Heo; Frederic Weisbecker; Thomas Gleixner; Paul
> E. McKenney; Christoph Lameter; Viresh Kumar; Catalin Marinas; Will Deacon;
> linux-doc@xxxxxxxxxxxxxxx; Linux API; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v8 06/14] task_isolation: provide strict mode
> configurable signal
> 

> >> >> On Tue, 20 Oct 2015 16:36:04 -0400
> >> >> Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
> >> >>
> >> >>> Allow userspace to override the default SIGKILL delivered
> >> >>> when a task_isolation process in STRICT mode does a syscall
> >> >>> or otherwise synchronously enters the kernel.
> >> >>>
> > <snip>
> >> >
> >> > It doesn't map SIGKILL to some other signal unconditionally.  It just allows
> >> > the "hey, you broke the STRICT contract and entered the kernel" signal
> >> > to be something besides the default SIGKILL.
> >> >
> >>
> >
> > <snip>
> >>
> >> I still dislike this thing.  It seems like a debugging feature being
> >> implemented using signals instead of existing APIs.  I *still* don't
> >> see why perf can't be used to accomplish your goal.
> >>
> >
> > It is not (just) a debugging feature. There are workloads were not
> performing an action is much preferred to being late.
> >
> > Consider the following artificial but representative scenario: a task running
> in strict isolation is controlling a radiotherapy alpha emitter.
> > The code runs in a tight event loop, reading an MMIO register with location
> data, making some calculation and in response writing an
> > MMIO register that triggers the alpha emitter. As a safety measure, each
> trigger is for a specific very short time frame - the alpha emitter
> > auto stops.
> >
> > The code has a strict assumption that no more than X cycles pass between
> reading the value and the response and the system is built in
> > such a way that as long as the code has mastery of the CPU the assumption
> holds true. If something breaks this assumption (unplanned
> > context switch to kernel), what you want to do is just stop place
> > rather than fire the alpha emitter X nanoseconds too late.
> >
> > This feature lets you say: if the "contract" of isolation is broken, notify/kill
> me at once.
> 
> That's a fair point.  It's risky, though, for quite a few reasons.
> 
> 1. If someone builds an alpha emitter like this, they did it wrong.
> The kernel should write a trigger *and* a timestamp to the hardware
> and the hardware should trigger at the specified time if the time is
> in the future and throw an error if it's in the past.  If you need to
> check that you made the deadline, check the actual desired condition
> (did you meat the deadline?) not a proxy (did the signal fire?).
> 

As I wrote above it is an *artificial* scenario. 

Yes, hardware and systems can be designed better, but they are not
always are and in these kind of systems, you really do want to have
double or triple checks.

Knowing such systems, even IF the hardware was designed as you 
specified (and I agree it should!) you would still add the software
protection.

> 2. This strict mode thing isn't exhaustive.  It's missing, at least,
> coverage for NMI, MCE, and SMI.  Sure, you can think that you've
> disabled all NMI sources, you can try to remember to set the
> appropriate boot flag that panics on MCE (and hope that you don't get
> screwed by broadcast MCE on Intel systems before it got fixed
> (Skylake?  Is the fix even available in a released chip?), and, for
> SMI, good luck...

You are right - it isn't exhaustive. It is one piece in a bigger puzzle.
Many of the other bits are platform specific and some of them have
been dealt with on the platform that care about these things.

Yes, we don't have dark magic to detect SMIs. Is that a reason to penalize
platforms where there is no such thing as SMI? 

> 3. You haven't dealt with IPIs.  The TLB flush code in particular
> seems like it will break all your assumptions.
>

But we have - in the general context. Consider this patch set from 2012 -
https://lwn.net/Articles/479510/

Not finished for sure. But what we have is now useful enough that it is used
in the real world for different workloads on different platforms, from packet
 processing, through HPC to high frequency trading.

> Maybe it would make sense to whack more of the moles before adding a
> big assertion that there aren't any moles any more.
> 

hm... maybe you are reading too much into this specific feature - its a 
"notify me, the application, if I asked you to do something that violates 
my previous request to be isolated", rather than "notify me whenever isolation is broken".

Does that make more sense?

Thanks,
Gilad--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html