On Sat, 2014-03-22 at 20:25 +0100, Oleg Nesterov wrote: > On 03/22, Tetsuo Handa wrote: > > > > Oleg Nesterov wrote: > > > Tetsuo, what do you think? > > > > I don't like blocking SIGKILL while doing operations that depend on memory > > allocation by other processes. If the OOM killer is triggered and it chose > > the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), > > it generates the OOM killer deadlock. > > It seems that we do not understand each other. > > I do not like this hack too. And it is even wrong, you can't really block > SIGKILL. And it is unnecessary in a sense that (I think) it is fine that > module_init() reacts to SIGKILL and aborts, just the fact it crashes the > kernel in the error paths is not fine. > > The driver should be fixed anyway. As for timeout, either userspace/systemd > should be changed to not send SIGKILL after 30 secs, or (better) the driver > should be changed to not waste 30 secs. > > The hack I sent can only serve as a short term solution, it should be > reverted once we have something better. And, otoh, this hack only affects > the problematic driver which should be fixed in any case. > > Do you see my point? I can be wrong, but I think that you constantly > misunderstand the intent. > > > My preference is to fix kthread_create() to handle SIGKILL gracefully. > > And now I do not understand you too. I do not understand why should we > "fix" kthread_create(). > > > Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) > > ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, > > commit 786235ee sounds like a kernel API breakage. > > Personally I do not really think so, but OK. In this case lets revert > 786235ee. > > > Commit 786235ee "kthread: make kthread_create() killable" changed to > > leave kthread_create() as soon as receiving SIGKILL. But this change > > caused boot failures because systemd-udevd receives SIGKILL due to timeout > > while loading SCSI controller drivers using finit_module() [1]. > > And I still think that 786235ee simply uncovered the problems in this > driver. Perhaps we should change kthread_create() for some reason, but > (imho) not because we need to help the buggy code. > > > Therefore, this patch changes kthread_create() from "wait forever in > > killable state" to "wait for 10 seconds in unkillable state, check for > > the OOM killer every second". > > Personally I dislike this change. In fact I think it is ugly. But this > is only my opinion. > > If you convince someone to take this patch I won't argue. OK, the fix from the SCSI point of view is to make the mpt teardown path actually work. The whole thing looks to be a complete mess because there's another place where it will do the wrong thing in mptscsih_remove(). You always have to call mpt_detach() otherwise the device doesn't get removed from the lists. In theory this patch fixes both bugs in the driver. James --- diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c index 727819c..282d39a 100644 --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1176,10 +1176,14 @@ mptscsih_remove(struct pci_dev *pdev) MPT_SCSI_HOST *hd; int sz1; + if (!host) + /* not brought up far enough to do scsi_host_attach() */ + goto out; + scsi_remove_host(host); if((hd = shost_priv(host)) == NULL) - return; + goto out; mptscsih_shutdown(pdev); @@ -1203,6 +1207,7 @@ mptscsih_remove(struct pci_dev *pdev) scsi_host_put(host); + out: mpt_detach(pdev); } -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html