Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 30, 2010 at 05:35:04PM -0700, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote:
> >
> >
> > john stultz wrote:
> >> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
> >>> Sukadev Bhattiprolu wrote:
> >>>> Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote:
> >>>> | | | >  		h->fl_type = lock->fl_type;
> >>>> | > +		h->fl_type_prev = lock->fl_type_prev;
> >>>> | >  		h->fl_flags = lock->fl_flags;
> >>>> | > +		if (h->fl_type & F_INPROGRESS && | > 
> >>>> +					(lock->fl_break_time > jiffies))
> >>>> | > +			h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
> >>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the 
> >>>> checkpoint.
> >>>> | It is used for relative-time calculations, for example, the expiry of
> >>>> | restart-blocks and timeouts.  I suggest that we use it here too to be
> >>>> | consistent.
> >>>>
> >>>> Well, I started off using ktime_begin but discussed this with John Stultz
> >>>> (CC'd here) who pointed out that mixing different domains of time is likely
> >>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
> >>>> a relative time. 
> >>>>
> >>>> I think use of ktime_begin for restart_blocks is fine (since they use
> >>>> ktime_t) but using ktime_t for file leases and converting between jiffies
> >>>> and nanoseconds could be a problem, unless we convert fl_break_time to
> >>>> seconds.
> >>>>
> >>>> IOW, can we leave the above computation of ->fl_rem_lease for now ?
> >>> The data on restart_blocks keep relative time - it's the the time
> >>> to expiry relative to ktime_begin (which is absolute).
> >>>
> >>> ktime_begin merely gives a reference point in time against which
> >>> all other time-related values should be saved. The advantage is
> >>> that all time computation are relative to the same point in time
> >>> at checkpoint/restart - the time when the ktime_begin is set. It's
> >>> more consistent this way.
> >>
> >> First, forgive me for not being very aware of the checkpoint/restart
> >> code. 
> >
> > On the contrary, forgive me if I'm stating the obvious below ...
> >
> >>
> >> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
> >> time the system booted (more or less). And it represents the checkpoint
> >> time, correct?
> >
> > As a rule, all time measurements in the checkpoint image are
> > saved as relative values, using the start-of-checkpoint as the
> > reference point in time (*).
> >
> > So at checkpoint, every absolute time value should be converted
> > to a value relative to the start-of-checkpoint. At restart, every
> > relative time value from the image is converted back (if needed)
> > to an absolute time value using a respective start-of-restart.
> 
> One general observation, slightly off-topic. You mention 
> "start-of-restart" here and ...
> >
> > This makes sense for the most common case, where if a process
> > had 5 more seconds to sleep at the time of checkpoint, we would
> > like it to have 5 more seconds to sleep after it restarts.
> 
> ... "after it restarts" here. These two can be quite different if,
> as you mention below, the C/R is a lengthy operation.

Worse! I think the big concern is not the duration of checkpoint but the
amount of time userspace expects to store a checkpoint before
restarting.

It's an arbitrary amount of time. A user could restart 50 days later.
Or 100. etc. Sure, it's _unlikely_, but we (checkpoint/restart implementers)
shouldn't count on seeing only "reasonable" times between completion of
checkpoint and initiation of restart. We need to be robust for all
the times we see.

I think that's why Oren made the choices he did. Making times relative
to ktime_begin certainly helps keep the time values semi-sane for those
crazy arbitrary cases in addition to being nice for the "reasonable"
cases.

Cheers,
	-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux