Serge E. Hallyn wrote: > Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): >> >> Serge E. Hallyn wrote: >>> Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): >>>> Serge E. Hallyn wrote: >>>>> (This is a patch against the checkpoint/restart kernel tree at >>>>> http://git.ncl.cs.columbia.edu/?p=linux-cr.git;a=shortlog;h=refs/heads/ckpt-v19-rc2.9) >>>>> >>>>> On x86, do_signal() leaves -516 in eax while it freezes, which >>>>> sys_restart() can use to detect that it should restart the >>>>> syscall which was interrupted by a signal (or the freezer). >>>>> >>>>> On s390, gprs[2] gets tweaked to -EINTR (-4) instead, leaving >>>>> us no reliable way to tell whether should be restarted. If the >>>>> task is checkpointed here and then restarted, then the last part >>>>> of do_signal() won't be done, and the interrupted syscall won't >>>>> be restarted. >>>>> >>>>> This patch defines TIF_RESTARTBLOCK as a thread flag showing that >>>>> the thread expects to be frozen while kicked out of a restartable >>>>> syscall by a signal. >>>>> >>>>> The TIF_RESTARTBLOCK flag is only set for the duration of the >>>>> get get_signal_to_deliver() which is where the task may get >>>>> frozen. We also set it in sys_restart() if the checkpointed task >>>>> had had TIF_RESTARTBLOCK set. It will get cleared if upon exiting >>>>> sys_restart() we jump to sysc_sigpending. If instead we jump back >>>>> to do_signal(), then TIF_RESTARTBLOCK will stay set again until >>>>> after get_signal_to_deliver() so that if it immediately freezes and >>>>> is re-checkpointed, the resulting second checkpoint image again >>>>> will have TIF_RESTARTBLOCK set. >>>> Two comments: >>>> >>>> 1) This note will be lost once we fold this patch into a clean >>>> patchset. Can you please make it part of the code ? >>> Sure, good idea. >>> >>>> 2) Maybe I'm missing something, but I'm not convinced. Can you >>>> elaborate on why this is correct in different cases ? Also, in >>>> particular with respect to the post-signal-sent snippet in the >>>> signal handling code: >>>> >>>> if (retval == -ERESTART_RESTARTBLOCK >>>> && regs->psw.addr == continue_addr) { >>>> >>>> regs->gprs[2] = __NR_restart_syscall; >>>> >>>> set_thread_flag(TIF_RESTART_SVC); >>>> >>>> } >>>> >>>> >>>> Would it do what you expect after a restart ? (restart modifies >>>> the psw.addr) >>> I don't understand the question. After sys_restart(), we won't be >>> returning to this kernel code. We'll either immediately call >>> restart_syscall(), or, if a signal was delivered before sys_restart(), >>> completed, then do_signal() will start again from the top. >> Ok, I re-read the code: let's look at these cases: >> >> case 1: checkpointee wasn't in syscall -- no problem. >> >> case 2: checkpointee was in syscall, no signal pending; when it was >> frozen, regs->svcnr became 0, and that's what we save, so on restart >> we won't enter that snippet again. Again, no problem. >> >> case 3: checkpointee was in syscall, signal pending; >> case 4: checkpointee was in syscall, signal received at restart; >> look at this snippet: >> >> if (signr > 0 && regs->psw.addr == restart_addr) { >> if (retval == -ERESTARTNOHAND >> || (retval == -ERESTARTSYS >> && !(current->sighand->action[signr-1].sa.sa_flags >> & SA_RESTART))) { >> regs->gprs[2] = -EINTR; >> regs->psw.addr = continue_addr; >> } >> } >> >> Because svcnr is/was 0, neither restart_addr nor continue_addr >> were setup, so this condition is always false, which I think is >> wrong. > > I've been focusing on the ERESTART_RESTARTBLOCK case. Can > we agree that all cases appear to be handled correctly there? Errr... everytime I need to go and read all the code from scratch to convince myself... and I my mind just context switched :( > > For the ERESTARTSYS/ERESTARTNOHAND case, I'm probably not > doing the right thing. For a single checkpoint, since either > there was no real signal (freezer) or it didn't get handled > before checkpoint, psw.addr gets checkpointed and restored > as restart_addr, which is the right thing. (since signr is > not >0, we would have kept the values the same after > get_signal_to_deliver()). > > But if a real signal gets delivered upon exit of sys_restart(), > then I think I do think we'll end up doing the wrong thing - > we'll restart the interrupted system call with the orig_gpr2, > so we'll pretend the signal did not get delivered, rather > than proceed past the call to the system call (in userspace) > with return value -EINTR. (Just how wrong is that?) > > This is all dense enough that it may be worth thinking of > a different way to handle it, but I'm not sure what that > way would be. The challenge is finding a *simple*, reliable > way to detect what the the initial conditions to do_signal() > where, based on the register/thread_info values as they are > at do_signal()->get_signal_to_deliver()->try_to_freeze(), > given the ways the values get swapped in the block above > the get_signal_to_deliver() call. > > The simplest thing by far would be if we could safely > move the get_signal_to_deliver() call before the > > if (regs->svcnr) { > continue_addr = regs->psw.addr; > ... > > block. I assume there are entry_64.S-related reasons why > we cannot? Yes, that's what I was thinking. And I don't know enough of s390 to understand why we could not. Either that or record the state before it is modified in dedicated checkpoint-related fields (e.g. per thread). Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers