On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote: > > On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > > On Sun, Oct 04, 2020 at 02:44:39PM +0000, Alex Belits wrote: > >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: > >>> External Email > >>> > >>> ------------------------------------------------------------------- > >>> --- > >>> On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote: > >>>> +/* > >>>> + * Description of the last two tasks that ran isolated on a given > >>>> CPU. > >>>> + * This is intended only for messages about isolation breaking. We > >>>> + * don't want any references to actual task while accessing this > >>>> from > >>>> + * CPU that caused isolation breaking -- we know nothing about > >>>> timing > >>>> + * and don't want to use locking or RCU. > >>>> + */ > >>>> +struct isol_task_desc { > >>>> + atomic_t curr_index; > >>>> + atomic_t curr_index_wr; > >>>> + bool warned[2]; > >>>> + pid_t pid[2]; > >>>> + pid_t tgid[2]; > >>>> + char comm[2][TASK_COMM_LEN]; > >>>> +}; > >>>> +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); > >>> So that's quite a huge patch that would have needed to be split up. > >>> Especially this tracing engine. > >>> > >>> Speaking of which, I agree with Thomas that it's unnecessary. It's > >>> too much > >>> code and complexity. We can use the existing trace events and perform > >>> the > >>> analysis from userspace to find the source of the disturbance. > >> The idea behind this is that isolation breaking events are supposed to > >> be known to the applications while applications run normally, and they > >> should not require any analysis or human intervention to be handled. > > Sure but you can use trace events for that. Just trace interrupts, workqueues, > > timers, syscalls, exceptions and scheduler events and you get all the local > > disturbance. You might want to tune a few filters but that's pretty much it. > > > > As for the source of the disturbances, if you really need that information, > > you can trace the workqueue and timer queue events and just filter those that > > target your isolated CPUs. > > > > I agree that we can do all those things with tracing. > However, IMHO having a simplified logging mechanism to gather the source of > violation may help in reducing the manual effort. > > Although, I am not sure how easy will it be to maintain such an interface > over time. The thing is: tracing is your simplified logging mechanism here. You can achieve the same in userspace with _way_ less code, no race, and you can do it in bash. Thanks.