Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

Nitesh Narayan Lal <nitesh@xxxxxxxxxx> · Mon, 5 Oct 2020 14:52:49 -0400

On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> On Sun, Oct 04, 2020 at 02:44:39PM +0000, Alex Belits wrote:
>> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
>>> External Email
>>>
>>> -------------------------------------------------------------------
>>> ---
>>> On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote:
>>>> +/*
>>>> + * Description of the last two tasks that ran isolated on a given
>>>> CPU.
>>>> + * This is intended only for messages about isolation breaking. We
>>>> + * don't want any references to actual task while accessing this
>>>> from
>>>> + * CPU that caused isolation breaking -- we know nothing about
>>>> timing
>>>> + * and don't want to use locking or RCU.
>>>> + */
>>>> +struct isol_task_desc {
>>>> +	atomic_t curr_index;
>>>> +	atomic_t curr_index_wr;
>>>> +	bool	warned[2];
>>>> +	pid_t	pid[2];
>>>> +	pid_t	tgid[2];
>>>> +	char	comm[2][TASK_COMM_LEN];
>>>> +};
>>>> +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
>>> So that's quite a huge patch that would have needed to be split up.
>>> Especially this tracing engine.
>>>
>>> Speaking of which, I agree with Thomas that it's unnecessary. It's
>>> too much
>>> code and complexity. We can use the existing trace events and perform
>>> the
>>> analysis from userspace to find the source of the disturbance.
>> The idea behind this is that isolation breaking events are supposed to
>> be known to the applications while applications run normally, and they
>> should not require any analysis or human intervention to be handled.
> Sure but you can use trace events for that. Just trace interrupts, workqueues,
> timers, syscalls, exceptions and scheduler events and you get all the local
> disturbance. You might want to tune a few filters but that's pretty much it.
>
> As for the source of the disturbances, if you really need that information,
> you can trace the workqueue and timer queue events and just filter those that
> target your isolated CPUs.
>

I agree that we can do all those things with tracing.
However, IMHO having a simplified logging mechanism to gather the source of
violation may help in reducing the manual effort.

Although, I am not sure how easy will it be to maintain such an interface
over time.

--
Thanks
Nitesh

Attachment:
signature.asc

Description: OpenPGP digital signature