On 07/30/2010 08:35 PM, Sukadev Bhattiprolu wrote: > Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote: >> >> >> john stultz wrote: >>> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote: >>>> Sukadev Bhattiprolu wrote: >>>>> Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote: >>>>> | | | > h->fl_type = lock->fl_type; >>>>> | > + h->fl_type_prev = lock->fl_type_prev; >>>>> | > h->fl_flags = lock->fl_flags; >>>>> | > + if (h->fl_type & F_INPROGRESS && | > >>>>> + (lock->fl_break_time > jiffies)) >>>>> | > + h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ; >>>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the >>>>> checkpoint. >>>>> | It is used for relative-time calculations, for example, the expiry of >>>>> | restart-blocks and timeouts. I suggest that we use it here too to be >>>>> | consistent. >>>>> >>>>> Well, I started off using ktime_begin but discussed this with John Stultz >>>>> (CC'd here) who pointed out that mixing different domains of time is likely >>>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies - >>>>> a relative time. >>>>> >>>>> I think use of ktime_begin for restart_blocks is fine (since they use >>>>> ktime_t) but using ktime_t for file leases and converting between jiffies >>>>> and nanoseconds could be a problem, unless we convert fl_break_time to >>>>> seconds. >>>>> >>>>> IOW, can we leave the above computation of ->fl_rem_lease for now ? >>>> The data on restart_blocks keep relative time - it's the the time >>>> to expiry relative to ktime_begin (which is absolute). >>>> >>>> ktime_begin merely gives a reference point in time against which >>>> all other time-related values should be saved. The advantage is >>>> that all time computation are relative to the same point in time >>>> at checkpoint/restart - the time when the ktime_begin is set. It's >>>> more consistent this way. >>> >>> First, forgive me for not being very aware of the checkpoint/restart >>> code. >> >> On the contrary, forgive me if I'm stating the obvious below ... >> >>> >>> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the >>> time the system booted (more or less). And it represents the checkpoint >>> time, correct? >> >> As a rule, all time measurements in the checkpoint image are >> saved as relative values, using the start-of-checkpoint as the >> reference point in time (*). >> >> So at checkpoint, every absolute time value should be converted >> to a value relative to the start-of-checkpoint. At restart, every >> relative time value from the image is converted back (if needed) >> to an absolute time value using a respective start-of-restart. > > One general observation, slightly off-topic. You mention > "start-of-restart" here and ... >> >> This makes sense for the most common case, where if a process >> had 5 more seconds to sleep at the time of checkpoint, we would >> like it to have 5 more seconds to sleep after it restarts. > > ... "after it restarts" here. These two can be quite different if, > as you mention below, the C/R is a lengthy operation. > > If application had 5 seconds remaining on the lease, and C/R takes > more than 5 seconds, the application would have spent the remaining > lease frozen. Of course trying to compute the values relative to the > "end-of-restart" is theoretically impossible :-). > > Maybe the user can adjust the lease-expiry in the checkpoint-image > to allow for this as well. Heh ... you cannot pre-adjust it since you don't know how long restart would take :) It would certainly be better if we could restore all the expiry times to be relative to the time when the restart _succeeds_ (and just before all tasks resume execution), as opposed to when the restart _begins_. Restart could take long time, and by the time it concludes leases (and timers) could expire. However, to implement such behavior, we'll need to keep a list of expiry-values-to-adjust throughout the restart, then when restart is about to complete, iterate through the list and adjust all of them at once relative to the (near) end-time. While this is possible, I don't think it is worth the additional complexity in the code. Most restarts will not take that long. Moreover, it can be viewed as a restart that succeeds quickly but then the processes are frozen/stopped/descheduled for some time so leases/timers expire. The requirement that all expiries will be measured (on checkpoint and restart) relative to the same point in time is very important (while the above is nice-to-have). Oren. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html