john stultz wrote: > On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote: >> Sukadev Bhattiprolu wrote: >>> Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote: >>> | >>> | >>> | > h->fl_type = lock->fl_type; >>> | > + h->fl_type_prev = lock->fl_type_prev; >>> | > h->fl_flags = lock->fl_flags; >>> | > + if (h->fl_type & F_INPROGRESS && >>> | > + (lock->fl_break_time > jiffies)) >>> | > + h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ; >>> | >>> | Hmmm -- we have a ctx->ktime_begin marking the start of the checkpoint. >>> | It is used for relative-time calculations, for example, the expiry of >>> | restart-blocks and timeouts. I suggest that we use it here too to be >>> | consistent. >>> >>> Well, I started off using ktime_begin but discussed this with John Stultz >>> (CC'd here) who pointed out that mixing different domains of time is likely >>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies - >>> a relative time. >>> >>> I think use of ktime_begin for restart_blocks is fine (since they use >>> ktime_t) but using ktime_t for file leases and converting between jiffies >>> and nanoseconds could be a problem, unless we convert fl_break_time to >>> seconds. >>> >>> IOW, can we leave the above computation of ->fl_rem_lease for now ? >> The data on restart_blocks keep relative time - it's the the time >> to expiry relative to ktime_begin (which is absolute). >> >> ktime_begin merely gives a reference point in time against which >> all other time-related values should be saved. The advantage is >> that all time computation are relative to the same point in time >> at checkpoint/restart - the time when the ktime_begin is set. It's >> more consistent this way. > > First, forgive me for not being very aware of the checkpoint/restart > code. On the contrary, forgive me if I'm stating the obvious below ... > > So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the > time the system booted (more or less). And it represents the checkpoint > time, correct? As a rule, all time measurements in the checkpoint image are saved as relative values, using the start-of-checkpoint as the reference point in time (*). So at checkpoint, every absolute time value should be converted to a value relative to the start-of-checkpoint. At restart, every relative time value from the image is converted back (if needed) to an absolute time value using a respective start-of-restart. This makes sense for the most common case, where if a process had 5 more seconds to sleep at the time of checkpoint, we would like it to have 5 more seconds to sleep after it restarts. (For the rare case where you care about the absolute timeout, userspace can modify the checkpoint image on-the-fly during restart to adjust the timeout value to be 0, or close enough to zero that the timeout will happen as soon as the restart completes). That said, ktime_begin has two purposes: 1) It stores the start-of-{checkpoint,restart} during checkpoint and restart respectively, and used for all conversions from absolute to relative time values. This is purely internal use during the checkpoint/restart operations. It make sense to use a single value, as opposed to compute the deltas "on the go", because checkpoint/restart can be lengthy operations (depending on the image size and IO speed). Doing so against current time, which is a moving target, can yield inconsistent results. (*) Looking at the code, I see that the hrtimer expiry times are actually calculated against current time - I'll fix that... 2) It provides information for the admin/user about what was the absolute time when the checkpoint operation began (maybe we should also end ktime_end...). Otherwise, we would have to wrap each checkpoint image with additional meta-data from userspace. (In particular, having that absolute time in the checkpoint image will allow a userspace filter to modify timeout values as needed in the rare case mentioned above). > Is there a similar checkpointed jiffies value? No. but ... >> I don't see the problem with jiffies vs ktime - there are functions >> to convert between different units (see jiffies.h). Even if you are >> concerned about reducing the resolution because of a conversion - >> well, it isn't realistic to expect nano-sec resolution after restart... > > You're right, there are functions to convert from relative jiffies to > relative nanoseconds, but there really aren't good functions to convert > from absolute jiffies to absolute CLOCK_MONOTONIC time. > > One can make a rough approximation, but its not a very precise method > with a number of complications (ie: 32bit jiffies wraps every ~50 days, > the INITIAL_JIFFIES offset has to be remembered, and the larger issue of > the fact that CLOCK_MONOTONIC is NTP corrected, while jiffies may or may > not be). So I'd strongly advise against trying to directly convert abs > jiffies values to abs CLOCK_MONOTONIC time. ... given this, to be able to compute the relative time where jiffies are involved (e.g. the remaining lease time relative to ktime_begin) we could also keep a ctx->jiffies_begin, calculate the delta, and then convert to ktime_t delta. Suka: if that works for you, can you please add this piece to your patch ? Thanks for the feedback ! Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers