Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): > > > Serge E. Hallyn wrote: > > Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): > >> > >> Serge E. Hallyn wrote: > >>> (This is a patch against the checkpoint/restart kernel tree at > >>> http://git.ncl.cs.columbia.edu/?p=linux-cr.git;a=shortlog;h=refs/heads/ckpt-v19-rc2.9) > >>> > >>> On x86, do_signal() leaves -516 in eax while it freezes, which > >>> sys_restart() can use to detect that it should restart the > >>> syscall which was interrupted by a signal (or the freezer). > >>> > >>> On s390, gprs[2] gets tweaked to -EINTR (-4) instead, leaving > >>> us no reliable way to tell whether should be restarted. If the > >>> task is checkpointed here and then restarted, then the last part > >>> of do_signal() won't be done, and the interrupted syscall won't > >>> be restarted. > >>> > >>> This patch defines TIF_RESTARTBLOCK as a thread flag showing that > >>> the thread expects to be frozen while kicked out of a restartable > >>> syscall by a signal. > >>> > >>> The TIF_RESTARTBLOCK flag is only set for the duration of the > >>> get get_signal_to_deliver() which is where the task may get > >>> frozen. We also set it in sys_restart() if the checkpointed task > >>> had had TIF_RESTARTBLOCK set. It will get cleared if upon exiting > >>> sys_restart() we jump to sysc_sigpending. If instead we jump back > >>> to do_signal(), then TIF_RESTARTBLOCK will stay set again until > >>> after get_signal_to_deliver() so that if it immediately freezes and > >>> is re-checkpointed, the resulting second checkpoint image again > >>> will have TIF_RESTARTBLOCK set. > >> Two comments: > >> > >> 1) This note will be lost once we fold this patch into a clean > >> patchset. Can you please make it part of the code ? > > > > Sure, good idea. > > > >> 2) Maybe I'm missing something, but I'm not convinced. Can you > >> elaborate on why this is correct in different cases ? Also, in > >> particular with respect to the post-signal-sent snippet in the > >> signal handling code: > >> > >> if (retval == -ERESTART_RESTARTBLOCK > >> && regs->psw.addr == continue_addr) { > >> > >> regs->gprs[2] = __NR_restart_syscall; > >> > >> set_thread_flag(TIF_RESTART_SVC); > >> > >> } > >> > >> > >> Would it do what you expect after a restart ? (restart modifies > >> the psw.addr) > > > > I don't understand the question. After sys_restart(), we won't be > > returning to this kernel code. We'll either immediately call > > restart_syscall(), or, if a signal was delivered before sys_restart(), > > completed, then do_signal() will start again from the top. > > Ok, I re-read the code: let's look at these cases: > > case 1: checkpointee wasn't in syscall -- no problem. > > case 2: checkpointee was in syscall, no signal pending; when it was > frozen, regs->svcnr became 0, and that's what we save, so on restart > we won't enter that snippet again. Again, no problem. > > case 3: checkpointee was in syscall, signal pending; > case 4: checkpointee was in syscall, signal received at restart; > look at this snippet: > > if (signr > 0 && regs->psw.addr == restart_addr) { > if (retval == -ERESTARTNOHAND > || (retval == -ERESTARTSYS > && !(current->sighand->action[signr-1].sa.sa_flags > & SA_RESTART))) { > regs->gprs[2] = -EINTR; > regs->psw.addr = continue_addr; > } > } > > Because svcnr is/was 0, neither restart_addr nor continue_addr > were setup, so this condition is always false, which I think is > wrong. I've been focusing on the ERESTART_RESTARTBLOCK case. Can we agree that all cases appear to be handled correctly there? For the ERESTARTSYS/ERESTARTNOHAND case, I'm probably not doing the right thing. For a single checkpoint, since either there was no real signal (freezer) or it didn't get handled before checkpoint, psw.addr gets checkpointed and restored as restart_addr, which is the right thing. (since signr is not >0, we would have kept the values the same after get_signal_to_deliver()). But if a real signal gets delivered upon exit of sys_restart(), then I think I do think we'll end up doing the wrong thing - we'll restart the interrupted system call with the orig_gpr2, so we'll pretend the signal did not get delivered, rather than proceed past the call to the system call (in userspace) with return value -EINTR. (Just how wrong is that?) This is all dense enough that it may be worth thinking of a different way to handle it, but I'm not sure what that way would be. The challenge is finding a *simple*, reliable way to detect what the the initial conditions to do_signal() where, based on the register/thread_info values as they are at do_signal()->get_signal_to_deliver()->try_to_freeze(), given the ways the values get swapped in the block above the get_signal_to_deliver() call. The simplest thing by far would be if we could safely move the get_signal_to_deliver() call before the if (regs->svcnr) { continue_addr = regs->psw.addr; ... block. I assume there are entry_64.S-related reasons why we cannot? > Also, if the signal arrives _after_ the restart > completes... ? > case 5: receives a signal during restart -- restart should fail. > > Oren. > > > > > In the first case we're doing exactly what we wanted to. > > > > In that second case, we enter do_signal with very different > > initial conditions than the checkpointed case: regs->svcnr is 0, > > so none of the gprs[2] or svcnr or psw-addr tweaking that > > would have happened the first time will happen. We'll just > > handle the signal (if any), then, upon exit of do_signal, > > proceed again with regs->gprs[2] == __NR_restart_syscall. > > > > But, since thread_info_flags->TIF_RESTARTBLOCK is set, > > if we get frozen and checkpointed again during the > > get_signal_to_deliver(), a restart of that image should > > be exactly the same as a restart of the current image. > > > > (That, at least, is my intent and understanding :) > > > > -serge > > -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html