On Fri, Dec 13, 2013 at 02:59:02PM -0500, Naoya Horiguchi wrote: > Hi Kamil, > > # Cced: Andi > > On Thu, Dec 12, 2013 at 04:25:27PM -0600, Kamil Iskra wrote: > > Please find below a trivial patch that changes the sending of BUS_MCEERR_AO > > SIGBUS signals so that they can be handled by an arbitrary thread of the > > target process. The current implementation makes it impossible to create a > > separate, dedicated thread to handle such errors, as the signal is always > > sent to the main thread. > > This can be done in application side by letting the main thread create a > dedicated thread for error handling, or by waking up existing/sleeping one. > It might not be optimal in overhead, but note that an action optional error > does not require to be handled ASAP. > And we need only one process to handle > an action optional error, so no need to send SIGBUS(BUS_MCEERR_AO) for every > processes/threads. Sorry, let me correct the above: "We need only one thread (not one process) to handle an action optional error." Thanks, Naoya > > > Also, do I understand it correctly that "action required" faults *must* be > > handled by the thread that triggered the error? I guess it makes sense for > > it to be that way, even if it circumvents the "dedicated handling thread" > > idea... > > Yes. Unlike action optional errors, action required faults can happen on > all processes/threads which map the error affected page, so in memory error > aware applications every thread must be able to handle SIGBUS(BUS_MCEERR_AR) > or just be killed. > > > The patch is against the 3.12.4 kernel. > > > > --- mm/memory-failure.c.orig 2013-12-08 10:18:58.000000000 -0600 > > +++ mm/memory-failure.c 2013-12-12 11:43:03.973334767 -0600 > > @@ -219,7 +219,7 @@ static int kill_proc(struct task_struct > > * to SIG_IGN, but hopefully no one will do that? > > */ > > si.si_code = BUS_MCEERR_AO; > > - ret = send_sig_info(SIGBUS, &si, t); /* synchronous? */ > > + ret = group_send_sig_info(SIGBUS, &si, t); /* synchronous? */ > > Personally, I don't think we need this change for the above mentioned reason. > And another concern is if this change can affect/break existing applications. > If it can, maybe you need to add (for example) a prctl attribute to show that > the process expects kernel to send SIGBUS(BUS_MCEERR_AO) only to the main > thread, or to all threads belonging to the process. > > Thanks, > Naoya Horiguchi > > > } > > if (ret < 0) > > printk(KERN_INFO "MCE: Error sending signal to %s:%d: %d\n", > > > > Thanks, > > > > Kamil > > > > -- > > Kamil Iskra, PhD > > Argonne National Laboratory, Mathematics and Computer Science Division > > 9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA > > phone: +1-630-252-7197 fax: +1-630-252-5986 > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>