* Luck, Tony <tony.luck@xxxxxxxxx> wrote: > In your proposed solution, we'd generate an event that would be > handled by some process/daemon ... but how would we ensure that the > affected process does not run in the mean time? Could we create > some analogous method to the ptrace stopped state, and hand control > of the affected process to the daemon that gets the event? Ok, i think there is a bit of a misunderstanding here - which is not a surprise really: we made generic arguments all along with very few specifics. The RAS daemon would deal with 'slow' policy action: fully recovered events. It would also log various events so that people can do post mortem etc. The main point of defining events here is so that there's a single method of transport and a single flexible method of defining and extracting events. Some of the event processing would occur in the kernel: in code that knows about memory_failure() and calls it while making sure we do not execute any user-space instruction. Some of the code would execute *very* early and in a very atomic way, still in NMI context: panicing the box if the error is so severe. Neither of these are steps that the RAS daemon can or wants to handle. The RAS tools would interact with the regular perf facilities setting and configuring the various RAS related events. They'd handle the 'severity' config bits, they'd initiate testing (injection), etc. Ideally the RAS daemon and tools would do what syslog does (and more), with more structured events. In the end of the day most of the 'policy action' is taken by humans anyway, who want to take a look at some ASCII output. So printk() integration and obvious ASCII output for everything is important along the way. > 2) The memory error was found in certain special sections of the > kernel for which recovery is possible (e.g. while copying to/from > user memory, perhaps also page copy and page clear). > > Here I don't have a solution. TIF_MCE_NOTIFY isn't checked when > returning from do_machine_check() to kernel code. Well, since we are already in interrupt context (albeit in a very atomic NMI context), sending a self-IPI is not strictly necessary. We could fix up the return address and jump to the right handler straight away during the IRET. A self-IPI might also not execute *immediately* - there's always the chance of APIC related delays. > In a CONFIG_PREEMPT=y kernel, all of the recoverable cases ought to > be in places where pre-emption is allowed ... so perhaps we can > also use the stop-and-switch option here? Yes, these are generally preemptible cases - and if they are not we can make the error fatal (we do not have to handle *every* complex case, giving up is a fair answer as well - we do not want rare code to be complex really). But you don't need to stop-and-switch: just stack-nesting on top of whatever preemptible code was running there would be enough, wouldnt it? That stops a task from executing until the decision has been made whether it can continue or not. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html