On Sat, Oct 17 2020 at 01:08, Alex Belits wrote: > On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote: >> On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > I think that the goal of "finding source of disturbance" interface is > different from what can be accomplished by tracing in two ways: > > 1. "Source of disturbance" should provide some useful information about > category of event and it cause as opposed to determining all precise > details about things being called that resulted or could result in > disturbance. It should not depend on the user's knowledge about > details Tracepoints already give you selectively useful information. > of implementations, it should provide some definite answer of what > happened (with whatever amount of details can be given in a generic > mechanism) even if the user has no idea how those things happen and > what part of kernel is responsible for either causing or processing > them. Then if the user needs further details, they can be obtained with > tracing. It's just a matter of defining the tracepoint at the right place. > 2. It should be usable as a runtime error handling mechanism, so the > information it provides should be suitable for application use and > logging. It should be usable when applications are running on a system > in production, and no specific tracing or monitoring mechanism can be > in use. That's a strawman really. There is absolutely no reason why a specific set of tracepoints cannot be enabled on a production system. Your tracker is a monitoring mechanism, just a different flavour. By your logic above it cannot be enabled on a production system either. Also you can enable tracepoints from a control application, consume, log and act upon them. It's not any different from opening some magic isolation tracker interface. There are even multiple ways to do that including libraries. > If, say, thousands of devices are controlling neutrino detectors on an > ocean floor, and in a month of work one of them got one isolation > breaking event, it should be able to report that isolation was broken > by an interrupt from a network interface, so the users will be able to > track it down to some userspace application reconfiguring those > interrupts. Tracing can do that and it can do it selectively on the isolated CPUs. It's just a matter of proper configuration and usage. > It will be a good idea to make such mechanism optional and suitable for > tracking things on conditions other than "always enabled" and "enabled > with task isolation". Tracing already provides that. Tracepoints are individually controlled and filtered. > However in my opinion, there should be something in kernel entry > procedure that, if enabled, prepared something to be filled by the > cause data, and we know at least one such situation when this kernel > entry procedure should be triggered -- when task isolation is on. A tracepoint will gather that information for you. task isolation is not special, it's just yet another way to configure and use a system and tracepoints provide everything you need with the bonus that you can gather more correlated information when you need it. In fact tracing and tracepoints have replaced all specialized trackers which were in the kernel before tracing was available. We're not going to add a new one just because. If there is anything which you find that tracing and tracepoints cannot provide then the obvious solution is to extend that infrastructure so it can serve your usecase. Thanks, tglx