On Fri, Dec 13, 2013 at 14:59:02 -0500, Naoya Horiguchi wrote: Hi Naoya, > On Thu, Dec 12, 2013 at 04:25:27PM -0600, Kamil Iskra wrote: > > Please find below a trivial patch that changes the sending of BUS_MCEERR_AO > > SIGBUS signals so that they can be handled by an arbitrary thread of the > > target process. The current implementation makes it impossible to create a > > separate, dedicated thread to handle such errors, as the signal is always > > sent to the main thread. > This can be done in application side by letting the main thread create a > dedicated thread for error handling, or by waking up existing/sleeping one. > It might not be optimal in overhead, but note that an action optional error > does not require to be handled ASAP. And we need only one process to handle > an action optional error, so no need to send SIGBUS(BUS_MCEERR_AO) for every > processes/threads. I'm not sure if I understand. "letting the main thread create a dedicated thread for error handling" is exactly what I was trying to do -- the problem is that SIGBUS(BUS_MCEERR_AO) signals are never sent to that thread, which is contrary to common expectations. The signals are sent to the main thread only, even if SIGBUS is masked there. Just to make sure that we're on the same page, here's a testcase that demonstrates the problem I'm trying to fix (I should've sent it the first time; sorry for being lazy): #include <pthread.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <sys/prctl.h> #include <unistd.h> void sigbus_handler(int sig, siginfo_t* si, void* ucontext) { printf("SIGBUS caught by thread %ld, code %d, addr %p\n", (long)pthread_self(), si->si_code, si->si_addr); } void* sigbus_thread(void* arg) { printf("sigbus thread: %ld\n", (long)pthread_self()); for (;;) pause(); } int main(void) { struct sigaction sa; sigset_t mask; char* buf; pthread_t thread_id; prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0); posix_memalign((void*)&buf, 4096, 4096); buf[0] = 0; printf("convenient address to hard offline: %p\n", buf); sa.sa_sigaction = sigbus_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_SIGINFO; sigaction(SIGBUS, &sa, NULL); pthread_create(&thread_id, NULL, sigbus_thread, NULL); sigemptyset(&mask); sigaddset(&mask, SIGBUS); pthread_sigmask(SIG_BLOCK, &mask, NULL); printf("main thread: %ld\n", (long)pthread_self()); for (;;) pause(); return 0; } This testcase uses a very common signal handling strategy in multithreaded programs: masking signals in all threads but one, created specifically for signal handling. It works just fine if I send it SIGBUS from another terminal using "kill". It does not work if I offline the page: the signal is routed to the main thread, where it's marked as pending; nothing gets printed out. As you were so kind to point out, SIGBUS(BUS_MCEERR_AO) does not need to be handled ASAP, so why should the kernel handle it differently to other non-critical signals? The current behavior seems inconsistent, and there is no convenient workaround (as a library writer, I have no control over the actions of the main thread). > And another concern is if this change can affect/break existing applications. > If it can, maybe you need to add (for example) a prctl attribute to show that > the process expects kernel to send SIGBUS(BUS_MCEERR_AO) only to the main > thread, or to all threads belonging to the process. I understand your concern. However, I believe that having SIGBUS(BUS_MCEERR_AO) behave consistently with established POSIX standards for signal handling outhweighs the concerns over potential incompatibilities, especially with a feature that is currently used by a very small subset of applications. Thus, I kindly ask you to reconsider. Regards, Kamil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>