Re: [EXT] Re: [PATCH v5 6/9] task_isolation: arch/arm64: enable task isolation functionality

Alex Belits <abelits@xxxxxxxxxxx> · Fri, 4 Dec 2020 00:37:32 +0000

On Wed, 2020-12-02 at 13:59 +0000, Mark Rutland wrote:
> External Email
> 
> -------------------------------------------------------------------
> ---
> Hi Alex,
> 
> On Mon, Nov 23, 2020 at 05:58:06PM +0000, Alex Belits wrote:
> > In do_notify_resume(), call
> > task_isolation_before_pending_work_check()
> > first, to report isolation breaking, then after handling all
> > pending
> > work, call task_isolation_start() for TIF_TASK_ISOLATION tasks.
> > 
> > Add _TIF_TASK_ISOLATION to _TIF_WORK_MASK, and _TIF_SYSCALL_WORK,
> > define local NOTIFY_RESUME_LOOP_FLAGS to check in the loop, since
> > we
> > don't clear _TIF_TASK_ISOLATION in the loop.
> > 
> > Early kernel entry code calls task_isolation_kernel_enter(). In
> > particular:
> > 
> > Vectors:
> > el1_sync -> el1_sync_handler() -> task_isolation_kernel_enter()
> > el1_irq -> asm_nmi_enter(), handle_arch_irq()
> > el1_error -> do_serror()
> > el0_sync -> el0_sync_handler()
> > el0_irq -> handle_arch_irq()
> > el0_error -> do_serror()
> > el0_sync_compat -> el0_sync_compat_handler()
> > el0_irq_compat -> handle_arch_irq()
> > el0_error_compat -> do_serror()
> > 
> > SDEI entry:
> > __sdei_asm_handler -> __sdei_handler() -> nmi_enter()
> 
> As a heads-up, the arm64 entry code is changing, as we found that our
> lockdep, RCU, and context-tracking management wasn't quite right. I
> have
> a series of patches:
> 
> https://lore.kernel.org/r/20201130115950.22492-1-mark.rutland@xxxxxxx
> 
> ... which are queued in the arm64 for-next/fixes branch. I intend to
> have some further rework ready for the next cycle.

Thanks!

>  I'd appreciate if you
> could Cc me on any patches altering the arm64 entry code, as I have a
> vested interest.

I will do that.

> 
> That was quite obviously broken if PROVE_LOCKING and NO_HZ_FULL were
> chosen and context tracking was in use (e.g. with
> CONTEXT_TRACKING_FORCE),

I am not yet sure about TRACE_IRQFLAGS, however NO_HZ_FULL and
CONTEXT_TRACKING have to be enabled for it to do anything.

I will check it with PROVE_LOCKING and your patches.

Entry code only adds an inline function that, if task isolation is
enabled, uses raw_local_irq_save() / raw_local_irq_restore(), low-level 
operations and accesses per-CPU variabled by offset, so at very least
it should not add any problems. Even raw_local_irq_save() /
raw_local_irq_restore() probably should be removed, however I wanted to
have something that can be safely called if by whatever reason
interrupts were enabled before kernel was fully entered.

>  so I'm assuming that this series has not been
> tested in that configuration. What sort of testing has this seen?
> 

On various available arm64 hardware, with enabled

CONFIG_TASK_ISOLATION
CONFIG_NO_HZ_FULL
CONFIG_HIGH_RES_TIMERS

and disabled:

CONFIG_HZ_PERIODIC
CONFIG_NO_HZ_IDLE
CONFIG_NO_HZ

> It would be very helpful for the next posting if you could provide
> any
> instructions on how to test this series (e.g. with pointers to any
> test
> suite that you have), since it's very easy to introduce subtle
> breakage
> in this area without realising it.

I will. Currently libtmc ( https://github.com/abelits/libtmc ) contains
all userspace code used for testing, however I should document the
testing procedures.

> 
> > Functions called from there:
> > asm_nmi_enter() -> nmi_enter() -> task_isolation_kernel_enter()
> > asm_nmi_exit() -> nmi_exit() -> task_isolation_kernel_return()
> > 
> > Handlers:
> > do_serror() -> nmi_enter() -> task_isolation_kernel_enter()
> >   or task_isolation_kernel_enter()
> > el1_sync_handler() -> task_isolation_kernel_enter()
> > el0_sync_handler() -> task_isolation_kernel_enter()
> > el0_sync_compat_handler() -> task_isolation_kernel_enter()
> > 
> > handle_arch_irq() is irqchip-specific, most call
> > handle_domain_irq()
> > There is a separate patch for irqchips that do not follow this
> > rule.
> > 
> > handle_domain_irq() -> task_isolation_kernel_enter()
> > do_handle_IPI() -> task_isolation_kernel_enter() (may be redundant)
> > nmi_enter() -> task_isolation_kernel_enter()
> 
> The IRQ cases look very odd to me. With the rework I've just done for
> arm64, we'll do the regular context tracking accounting before we
> ever
> get into handle_domain_irq() or similar, so I suspect that's not
> necessary at all?

The goal is to call task_isolation_kernel_enter() before anything that
depends on a CPU state, including pipeline, that could remain un-
synchronized when the rest of the kernel was sending synchronization
IPIs. Similarly task_isolation_kernel_return() should be called when it
is safe to turn off synchronization. If rework allows it to be done
earlier, there is no need to touch more specific functions.

> --- a/arch/arm64/include/asm/barrier.h
> > +++ b/arch/arm64/include/asm/barrier.h
> > @@ -49,6 +49,7 @@
> >  #define dma_rmb()	dmb(oshld)
> >  #define dma_wmb()	dmb(oshst)
> >  
> > +#define instr_sync()	isb()
> 
> I think I've asked on prior versions of the patchset, but what is
> this
> for? Where is it going to be used, and what is the expected
> semantics?
> I'm wary of exposing this outside of arch code because there aren't
> strong cross-architectural semantics, and at the least this requires
> some documentation.

This is intended as an instruction pipeline flush for the situation
when arch-independent code has to synchronize with potential changes
that it missed. This is necessary after some other CPUs could modify
code (and send IPIs to notify the rest but not isolated CPU) while this
one was still running isolated task or, more likely, exiting from it,
so it might be unlucky enough to pick the old instructions before that
point.

It's only used on kernel entry.

> 
> If it's unused, please delete it.
> 
> [...]
> 
> > diff --git a/arch/arm64/kernel/entry-common.c
> > b/arch/arm64/kernel/entry-common.c
> > index 43d4c329775f..8152760de683 100644
> > --- a/arch/arm64/kernel/entry-common.c
> > +++ b/arch/arm64/kernel/entry-common.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/context_tracking.h>
> >  #include <linux/ptrace.h>
> >  #include <linux/thread_info.h>
> > +#include <linux/isolation.h>
> >  
> >  #include <asm/cpufeature.h>
> >  #include <asm/daifflags.h>
> > @@ -77,6 +78,8 @@ asmlinkage void notrace el1_sync_handler(struct
> > pt_regs *regs)
> >  {
> >  	unsigned long esr = read_sysreg(esr_el1);
> >  
> > +	task_isolation_kernel_enter();
> 
> For regular context tracking we only acount the user<->kernel
> transitions.
> 
> This is a kernel->kernel transition, so surely this is not necessary?

Right. If we entered kernel from an isolated task, we have already
changed the flags.

> 
> If nothing else, it doesn't feel well-balanced.
> 
> I havwe not looked at the rest of this patch (or series) in detail.
> 
> Thanks,
> Mark.

My goal was to make sure that all transitions between kernel and
userspace are covered, so when in doubt I have added corresponding
calls those inline functions, and made them safe to be called from
those places. With improved entry-exit code it should be easier to be
sure where this can be done in a cleaner way.

-- 
Alex