Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

Frederic Weisbecker <frederic@xxxxxxxxxx> · Tue, 6 Oct 2020 12:35:41 +0200

On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote:
> 
> On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > On Sun, Oct 04, 2020 at 02:44:39PM +0000, Alex Belits wrote:
> >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> >>> External Email
> >>>
> >>> -------------------------------------------------------------------
> >>> ---
> >>> On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote:
> >>>> +/*
> >>>> + * Description of the last two tasks that ran isolated on a given
> >>>> CPU.
> >>>> + * This is intended only for messages about isolation breaking. We
> >>>> + * don't want any references to actual task while accessing this
> >>>> from
> >>>> + * CPU that caused isolation breaking -- we know nothing about
> >>>> timing
> >>>> + * and don't want to use locking or RCU.
> >>>> + */
> >>>> +struct isol_task_desc {
> >>>> +	atomic_t curr_index;
> >>>> +	atomic_t curr_index_wr;
> >>>> +	bool	warned[2];
> >>>> +	pid_t	pid[2];
> >>>> +	pid_t	tgid[2];
> >>>> +	char	comm[2][TASK_COMM_LEN];
> >>>> +};
> >>>> +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
> >>> So that's quite a huge patch that would have needed to be split up.
> >>> Especially this tracing engine.
> >>>
> >>> Speaking of which, I agree with Thomas that it's unnecessary. It's
> >>> too much
> >>> code and complexity. We can use the existing trace events and perform
> >>> the
> >>> analysis from userspace to find the source of the disturbance.
> >> The idea behind this is that isolation breaking events are supposed to
> >> be known to the applications while applications run normally, and they
> >> should not require any analysis or human intervention to be handled.
> > Sure but you can use trace events for that. Just trace interrupts, workqueues,
> > timers, syscalls, exceptions and scheduler events and you get all the local
> > disturbance. You might want to tune a few filters but that's pretty much it.
> >
> > As for the source of the disturbances, if you really need that information,
> > you can trace the workqueue and timer queue events and just filter those that
> > target your isolated CPUs.
> >
> 
> I agree that we can do all those things with tracing.
> However, IMHO having a simplified logging mechanism to gather the source of
> violation may help in reducing the manual effort.
> 
> Although, I am not sure how easy will it be to maintain such an interface
> over time.

The thing is: tracing is your simplified logging mechanism here. You can achieve
the same in userspace with _way_ less code, no race, and you can do it in
bash.

Thanks.