On Thu, 21 Jan 2010, Serge E. Hallyn wrote: > Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): > > Hi, > > > > I'm not sure thix fix is totally correct. > > > > First, a reminder of how -ERESTART... errors behave: > > > > 1) If a (user) signal handler was *not* called then: > > -ERESTARTSYS: re-execute syscall > > -ERESTARTNOHAND: re-execute syscall > > -ERESTARTNOINTR: re-execute syscall > > -ERESTART_RESTARTBLOCK: arrange to call sys_restart_syscall > > > > 2) If a (user) signal handler *was* called then: > > -ERESTARTSYS: re-execute syscall iff SA_RESTART (else EINTR) > > -ERESTARTNOHAND: return -EINTR > > -ERESTARTNOINTR: re-execute syscall > > -ERESTART_RESTARTBLOCK: return -EINTR > > > > The difference between x86 and s390 is _when_ (and how) the task state is > > altered (to arrange for re-execution), in case of task freezing: > > > > * In x86, changes occur _after_ the task is frozen, and the work is split > > between handle_signal() and do_signal(). > > > > * In s390, some changes occur _before_ the task is frozen, and possibly > > undone after the freeze if a signal handler was called. All work is done > > is do_signal(). Because of that, our checkpoint only sees a partial view > > of the task's state. > > > > Now back to the proposed fix: it isn't sufficient because: > > > > (a) If the process has a signal pending when checkpointed, then the state > > already reflects some changes by do_signal(). If the signal leads to a > > (user) signal handler after restart, then do_signal() (after restart) will > > _not_ undo the changes originally done before the checkpoint. > > > > This issue holds for ERESTARTSYS, ERESTARTNOHAND and ERESTART_RESTARTBLOCK. > > > > Note that do_signal() after restart will not do the changes again either > > in case of -ERESTART.... value, because the regs->svcnr was set to 0 and > > recorded that way. > > > > (b) Same, but also if the process didn't have a signal at checkpoint > > time, but will have one during/after restart but before resuming. > > > > (c) Because do_restart() selects its return value from gprs[2] (upon > > successful completion), then when it tries to resume userspace and > > enters do_signal() it will have -EINTR (which isn't the original return > > value!), and therefore not set your new TIF_RESTARTBLOCK bit. If the task > > is now checkpointed again, that state will be lost. > > Sorry if I'm misunderstanding what you're saying, but the return value > after exiting sys_restart() won't be -EINTR now, bc my patch sets it to > __NR_restart_syscall. True ... but the consequences are the same: Consider task A is checkpointed, then restarted, but before it completely resumes userspace, it is checkpointed again. For the second checkpoint, do_signal() will see svcnr==0, and even if you set svcnr, then gprs[2]>0. Therefore, do_signal() will not set (again) the TIF_RESTARTBLOCK flag because that happens only for svcnr!=0 and gprs[2]=-ERESTART_RESTARTBLOCK. So the second checkpoint image lost that particular state you were interested. Therefore, when you restart from that image, the gprs[2] will be set correctly (if ignoring issues a, b), but the TIF_RESTART_SVC won't be set again. > > > As a side note: I noticed that on x86 there are {checkpoint,restore}_thread() > > that save the thread flags and restore the necessary flags. They also > > check the flags - at checkpoint to ensure the task is checkpointable, and > > at restore for sanity. Is there not a need for something similar for s390? > > > > That would be an appropriate place to svae and restore a TIF_RESTARTBLOCK > > thread flag and fix (c) above, should you stick to that method. > > We actually want that flag cleared - the flag is only there so that > checkpoint can tell that it needs to set the 'should_restart' flag in > the thread info in the checkpoint image. sys_estart does not then reset > that flag, it instead does the steps which are done in the last block > of do_signal(): set gprs[2] to NR_syscall_restart and add the TIF_RESTART_SVC > thread flag. The problem is that sys_restart *does not even set* that flag, and do_signal() will not set that flag either after restarting from a checkpoint that had should_restart set. Therefore a subsequent checkpoint that occurs _before_ the process really resumes to userspace will not carry that information. Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers