* Don Zickus <dzickus@xxxxxxxxxx> wrote: > On Tue, May 17, 2011 at 02:41:53PM +0800, Huang Ying wrote: > > On 05/17/2011 03:33 AM, Don Zickus wrote: > > > On Tue, May 10, 2011 at 11:08:41AM +0800, Huang Ying wrote: > > >> The testing of Generic Hardware Error Source (GHES) is quite > > >> difficult, because special hardware is needed to trigger the hardware > > >> error. So a software based hardware error injector for GHES is > > >> implemented. > > >> > > >> Error notification is not provided in this patch. So you still need > > >> some NMI/SCI/IRQ injecting support to make it work. > > > > > > Should we add that to this patch, otherwise it seems like the injection > > > isn't very useful or intuitive from the end-user perspective that they > > > have to provide their own notification source (ie NMI/SCI/MCE/IRQ). > > > > We can provide the NMI/SCI/IRQ injecting in another patch. What do you > > think about the NMI injecting patch attached? > > I understand what the patch is doing and I like the various injection > points, but looking at your other injection modules I start to wonder if > there is a smarter and easier way to do all this. I believe the software > injection is definitely useful but it does add bloat to the kernel. > > I am starting to like Ingo's event filtering idea for stuff like this I > think (though I am still wrapping my head around it). The beauty of > kprobes and tracepoints and even jump labels was that they were not very > intrusive, they did their work on the side. It would be nice if we could > figure out a framework for the injection stuff that did something similar. > > Perhaps Ingo has some ideas? Boris has injection in the EDAC code as well and wants it for RAS purposes and i recently outlined to him how event injection could possible look like in the not so far future: ----------------> I think the model we want is to inject actual perf events at the *kernel* level, and to add the ability for some events (MCE events here) to also run a (optional) callback once user-space does that injection. So for example [sufficienty privileged] user-space could inject *any* perf event - for example a PERF_COUNT_HW_CACHE_MISSES event (for test purposes) and any tooling that runs could not tell apart this injected event from a real event. Once we have that, adding a injection callback to MCE events is just another step: such a callback could propagate the injected event to the real hardware for example, if that is possible. (it would validate, etc. as well) In the generic case the event just gets injected into the perf event stream. The ABI for injection could be some obvious extension, either another ioctl variant to the perf fd itself, we already have various ways to access it: #define PERF_EVENT_IOC_ENABLE _IO ('$', 0) #define PERF_EVENT_IOC_DISABLE _IO ('$', 1) #define PERF_EVENT_IOC_REFRESH _IO ('$', 2) #define PERF_EVENT_IOC_RESET _IO ('$', 3) #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64) #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) Or sys_write() access to the perf event fd. The sys_write() one looks like the conceptually nicest solution to me, because we can read() the fd as well to get event (counts..) out of it. I think this model would give us a *lot* of testing power, and we could utilize arbitrary hardware-injection capabilities as well. <---------------- That way what would remain in mm/memory-failure.c file is all the useful (and interesting!) MM specific knowledge: the method of getting to a list of affected tasks for policy action, to collect the tasks that are affected by an anonymous page going bad, or by a pagecache page going bad, etc. These would be offered as filter action functionality, and could be triggered from filters straight in the kernel, without having to touch a user-space daemon. The whole boring transport, filtering, enumeration and configuration that is duplicated here would go away and would be replaced by EVENT() definitions in the places that generate events and callbacks to filter action in mm/memory-inject.c. Now what is somewhat unfortunate as a practical matter is that some of this functionality has already been exposed in semi-ABI ways in an ad-hoc fashion, so some of the design may be hardcoded. That does not keep me from pointing out when i see the mess growing ... :-) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html