Hi Kamil, # Cced: Andi On Thu, Dec 12, 2013 at 04:25:27PM -0600, Kamil Iskra wrote: > Please find below a trivial patch that changes the sending of BUS_MCEERR_AO > SIGBUS signals so that they can be handled by an arbitrary thread of the > target process. The current implementation makes it impossible to create a > separate, dedicated thread to handle such errors, as the signal is always > sent to the main thread. This can be done in application side by letting the main thread create a dedicated thread for error handling, or by waking up existing/sleeping one. It might not be optimal in overhead, but note that an action optional error does not require to be handled ASAP. And we need only one process to handle an action optional error, so no need to send SIGBUS(BUS_MCEERR_AO) for every processes/threads. > Also, do I understand it correctly that "action required" faults *must* be > handled by the thread that triggered the error? I guess it makes sense for > it to be that way, even if it circumvents the "dedicated handling thread" > idea... Yes. Unlike action optional errors, action required faults can happen on all processes/threads which map the error affected page, so in memory error aware applications every thread must be able to handle SIGBUS(BUS_MCEERR_AR) or just be killed. > The patch is against the 3.12.4 kernel. > > --- mm/memory-failure.c.orig 2013-12-08 10:18:58.000000000 -0600 > +++ mm/memory-failure.c 2013-12-12 11:43:03.973334767 -0600 > @@ -219,7 +219,7 @@ static int kill_proc(struct task_struct > * to SIG_IGN, but hopefully no one will do that? > */ > si.si_code = BUS_MCEERR_AO; > - ret = send_sig_info(SIGBUS, &si, t); /* synchronous? */ > + ret = group_send_sig_info(SIGBUS, &si, t); /* synchronous? */ Personally, I don't think we need this change for the above mentioned reason. And another concern is if this change can affect/break existing applications. If it can, maybe you need to add (for example) a prctl attribute to show that the process expects kernel to send SIGBUS(BUS_MCEERR_AO) only to the main thread, or to all threads belonging to the process. Thanks, Naoya Horiguchi > } > if (ret < 0) > printk(KERN_INFO "MCE: Error sending signal to %s:%d: %d\n", > > Thanks, > > Kamil > > -- > Kamil Iskra, PhD > Argonne National Laboratory, Mathematics and Computer Science Division > 9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA > phone: +1-630-252-7197 fax: +1-630-252-5986 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>