On 03/19/2014 07:49 AM, Tetsuo Handa wrote: > Oleg Nesterov wrote: >>>> If we need the urgent hack to fix the regression, then I suggest to change >>>> scsi_host_alloc() temporary until mptsas (or whatever) is fixed. >>> Device initialization taking longer than 30 seconds is possible and is not a >>> hang up. It is systemd which needs to be fixed. >> Perhaps systemd needs the fix too, I do not know. But this is irrelevant, >> I think. Or at least this should be discussed separately. > I confirmed that this problem goes away if systemd-udevd supports longer > timeout. > >> kthread_run() can fail anyway, mptsas_probe() should not crash the kernel. > Right. But mptsas_probe() triggering an OOPS is irrelevant to kthread_run() > ( comment #27 ). > >> And btw, it is not clear to me if in this case device initialization really >> needs more than 30 seconds... My understanding is probably wrong, so please >> correct me. But it seems that before your "make kthread_create() killable" >> >> - probe hangs >> >> - SIGKILL wakes it up >> >> - so I assume that the probe was interrupted and didn't finish >> correctly ??? >> >> - initialization continues, does scsi_host_alloc(), etc, and >> everything works fine even if probe was interrupted? >> > I confirmed that device initialization really took more than 30 seconds > ( comments #51 and #52 ). > >> So perhaps that probe should not hang and this should be fixed too ? >> Do you know where exactly it hangs? And where it is woken up by SIGKILL ? >> Or I totally misunderstood ? > The probe did not hang. SIGKILL affected only wait_for_completion_killable() > in kthread_create_on_node() called by mptsas_probe() via scsi_host_alloc(). > Thus, the probe was interrupted because kthread_run() returned an error. > >>> I think we need a bit different version, in order to take TIF_MEMDIE flag into >>> account at the caller of kthread_create(), for the purpose of commit 786235ee >>> is "try to die as soon as possible if chosen by the OOM killer". >>> >>> for (;;) { >>> shost->ehandler = kthread_run(scsi_error_handler, shost, >>> "scsi_eh_%d", shost->host_no); >>> if (PTR_ERR(shost->ehandler) != -EINTR || >>> test_thread_flag(TIF_MEMDIE)) >> Well, personally I do not care about TIF_MEMDIE/oom at all. We need the >> temporary hack (unless we have the "right" fix right now) which should be >> reverted later. > I do seriously care about TIF_MEMDIE/oom. Last week I respond to a trouble > which hit "kernel: request_module() OOM local DoS" (RHBZ #853474) without > any malice. > >> Not sure I understand... Yes, wait_for_completion_killable() can return >> immediately if TIF_SIGPENDING will be set again for any reason. Say, another >> signal. But the next iteration will clear TIF_SIGPENDING ? >> >>> As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets >>> TIF_SIGPENDING flag >> Ah, I see, you mean that kmalloc() can do this every time. No, this should >> not happen or we have another problem. > Then, what happens if somebody does > > while (1) > kill(pid, SIGKILL); > > where pid is the process calling kthread_run() from the "for (;;)" loop in > scsi_host_alloc()? Theoretically, it will form an infinite retry loop. > Clearing TIF_SIGPENDING does not guarantee that next > wait_for_completion_killable() does not return immediately. > Doing retry decision at scsi_host_alloc() will make things worse than > doing it at kthread_create_on_node(). > >> Anyway. I agree with any hack in scsi_host_alloc/etc, this is up to >> maintainers. I still think that your change uncovered the problems in >> drivers/message/fusion/, these problems should be fixed somehow. >> >> Dear maintainers, we need your help. >> > Right. We found that we can fix this problem by updating systemd-udevd to > support longer timeout ( comment #53 ). Joseph, would you consult systemd > maintainers? Thanks everyone for reviewing this bug. Message sent to systemd mailing list: http://lists.freedesktop.org/archives/systemd-devel/2014-March/018006.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html