Re: [PATCH] mm/memory-failure.c: send action optional signal to an arbitrary thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kamil,

# Cced: Andi

On Thu, Dec 12, 2013 at 04:25:27PM -0600, Kamil Iskra wrote:
> Please find below a trivial patch that changes the sending of BUS_MCEERR_AO
> SIGBUS signals so that they can be handled by an arbitrary thread of the
> target process.  The current implementation makes it impossible to create a
> separate, dedicated thread to handle such errors, as the signal is always
> sent to the main thread.

This can be done in application side by letting the main thread create a
dedicated thread for error handling, or by waking up existing/sleeping one.
It might not be optimal in overhead, but note that an action optional error
does not require to be handled ASAP. And we need only one process to handle
an action optional error, so no need to send SIGBUS(BUS_MCEERR_AO) for every
processes/threads.

> Also, do I understand it correctly that "action required" faults *must* be
> handled by the thread that triggered the error?  I guess it makes sense for
> it to be that way, even if it circumvents the "dedicated handling thread"
> idea...

Yes. Unlike action optional errors, action required faults can happen on
all processes/threads which map the error affected page, so in memory error
aware applications every thread must be able to handle SIGBUS(BUS_MCEERR_AR)
or just be killed.

> The patch is against the 3.12.4 kernel.
> 
> --- mm/memory-failure.c.orig	2013-12-08 10:18:58.000000000 -0600
> +++ mm/memory-failure.c	2013-12-12 11:43:03.973334767 -0600
> @@ -219,7 +219,7 @@ static int kill_proc(struct task_struct
>  		 * to SIG_IGN, but hopefully no one will do that?
>  		 */
>  		si.si_code = BUS_MCEERR_AO;
> -		ret = send_sig_info(SIGBUS, &si, t);  /* synchronous? */
> +		ret = group_send_sig_info(SIGBUS, &si, t);  /* synchronous? */

Personally, I don't think we need this change for the above mentioned reason.
And another concern is if this change can affect/break existing applications.
If it can, maybe you need to add (for example) a prctl attribute to show that
the process expects kernel to send SIGBUS(BUS_MCEERR_AO) only to the main
thread, or to all threads belonging to the process.

Thanks,
Naoya Horiguchi

>  	}
>  	if (ret < 0)
>  		printk(KERN_INFO "MCE: Error sending signal to %s:%d: %d\n",
> 
> Thanks,
> 
> Kamil
> 
> -- 
> Kamil Iskra, PhD
> Argonne National Laboratory, Mathematics and Computer Science Division
> 9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA
> phone: +1-630-252-7197  fax: +1-630-252-5986
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]