On 03/20, Oleg Nesterov wrote: > > On 03/20, Joseph Salisbury wrote: > > > > There was some testing done with your test kernel. The data collected > > while running your kernel is available in the bug report: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58 > > Joseph, thanks a lot. > > I'll try to read the logs tomorrow, but at first glance Tetsuo was right, > I do not see a "long" sleep in that log. Yes, it seems that it actually needs > 30 secs. It spends most of the time (30.13286 seconds) in msleep+0x20/0x30 WaitForDoorbellInt+0x103/0x130 [mptbase] WaitForDoorbellReply+0x42/0x220 [mptbase] mpt_handshake_req_reply_wait+0x1dc/0x2c0 [mptbase] SendPortEnable.constprop.23+0x94/0xc0 [mptbase] WaitForDoorbellInt() does msleep(1) in a loop. This trace starts at the line 6001, and it is repeated 3792 times, till the line 176686 which apparently shows the trace of the 2nd WaitForDoorbellInt() in WaitForDoorbellReply(). SendPortEnable: if (ioc->ir_firmware || ioc->bus_type == SAS) { rc = mpt_handshake_req_reply_wait(ioc, req_sz, (u32*)&port_enable, reply_sz, (u16*)&reply_buf, 300 /*seconds*/, sleepFlag); } else { rc = mpt_handshake_req_reply_wait(ioc, req_sz, (u32*)&port_enable, reply_sz, (u16*)&reply_buf, 30 /*seconds*/, sleepFlag); } I am wondering which branch calls mpt_handshake_req_reply_wait(), the else's timeout=30 (passed to the 1st WaitForDoorbellInt) suspiciously matches the time WaitForDoorbellInt() actually runs... But everything works fine and at first glance the potential timeout error should be propogated correctly. So "timeout" is probably 300. And probably this all is fine. All I can suggest is the dirty hack for now. User-space should be changed too, I think, but this is another story. Tetsuo, what do you think? Oleg. --- --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -5395,6 +5395,8 @@ static struct pci_driver mptsas_driver = { #endif }; +#include <linux/signal.h> + static int __init mptsas_init(void) { @@ -5424,7 +5426,31 @@ mptsas_init(void) mpt_event_register(mptsasDoneCtx, mptsas_event_process); mpt_reset_register(mptsasDoneCtx, mptsas_ioc_reset); - error = pci_register_driver(&mptsas_driver); + { + sigset_t full, save; + /* + * KILL ME. THIS IS THE DIRTY AND WRONG HACK WAITING FOR THE + * FIX FROM MAINTAINERS. + * + * - This driver needs a lot of time to complete SendPortEnable() + * but systemd-udevd sends SIGKILL after 30 seconds, see + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 + * + * Probably user-space should be changed, but: + * + * - Since commit 786235eeba0e "kthread: make kthread_create() + * killable" scsi_host_alloc() becomes killable and this SIGKILL + * crashes the kernel. + * + * If scsi_host_alloc() fails mptsas_probe() blindly calls + * mptscsih_remove() and scsi_remove_host() hits host == NULL. + */ + sigfillset(&full); + sigprocmask(SIG_SETMASK, &full, &save); + error = pci_register_driver(&mptsas_driver); + sigprocmask(SIG_SETMASK, &save, NULL); + } + if (error) sas_release_transport(mptsas_transport_template); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html