Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





john stultz wrote:
On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
Sukadev Bhattiprolu wrote:
Oren Laadan [orenl@xxxxxxxxxxxxxxx] wrote:
| | | > h->fl_type = lock->fl_type;
| > +		h->fl_type_prev = lock->fl_type_prev;
| >  		h->fl_flags = lock->fl_flags;
| > + if (h->fl_type & F_INPROGRESS && | > + (lock->fl_break_time > jiffies))
| > +			h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
| | Hmmm -- we have a ctx->ktime_begin marking the start of the checkpoint.
| It is used for relative-time calculations, for example, the expiry of
| restart-blocks and timeouts.  I suggest that we use it here too to be
| consistent.

Well, I started off using ktime_begin but discussed this with John Stultz
(CC'd here) who pointed out that mixing different domains of time is likely
to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
a relative time.
I think use of ktime_begin for restart_blocks is fine (since they use
ktime_t) but using ktime_t for file leases and converting between jiffies
and nanoseconds could be a problem, unless we convert fl_break_time to
seconds.

IOW, can we leave the above computation of ->fl_rem_lease for now ?
The data on restart_blocks keep relative time - it's the the time
to expiry relative to ktime_begin (which is absolute).

ktime_begin merely gives a reference point in time against which
all other time-related values should be saved. The advantage is
that all time computation are relative to the same point in time
at checkpoint/restart - the time when the ktime_begin is set. It's
more consistent this way.

First, forgive me for not being very aware of the checkpoint/restart
code.

On the contrary, forgive me if I'm stating the obvious below ...


So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
time the system booted (more or less). And it represents the checkpoint
time, correct?

As a rule, all time measurements in the checkpoint image are
saved as relative values, using the start-of-checkpoint as the
reference point in time (*).

So at checkpoint, every absolute time value should be converted
to a value relative to the start-of-checkpoint. At restart, every
relative time value from the image is converted back (if needed)
to an absolute time value using a respective start-of-restart.

This makes sense for the most common case, where if a process
had 5 more seconds to sleep at the time of checkpoint, we would
like it to have 5 more seconds to sleep after it restarts.

(For the rare case where you care about the absolute timeout,
userspace can modify the checkpoint image on-the-fly during
restart to adjust the timeout value to be 0, or close enough
to zero that the timeout will happen as soon as the restart
completes).

That said, ktime_begin has two purposes:

1) It stores the start-of-{checkpoint,restart} during checkpoint
and restart respectively, and used for all conversions from
absolute to relative time values.  This is purely internal use
during the checkpoint/restart operations.
It make sense to use a single value, as opposed to compute the
deltas "on the go", because checkpoint/restart can be lengthy
operations (depending on the image size and IO speed). Doing
so against current time, which is a moving target, can yield
inconsistent results.

(*) Looking at the code, I see that the hrtimer expiry times
are actually calculated against current time - I'll fix that...

2) It provides information for the admin/user about what was
the absolute time when the checkpoint operation began (maybe
we should also end ktime_end...). Otherwise, we would have to
wrap each checkpoint image with additional meta-data from
userspace.
(In particular, having that absolute time in the checkpoint
image will allow a userspace filter to modify timeout values
as needed in the rare case mentioned above).

Is there a similar checkpointed jiffies value?

No. but ...

I don't see the problem with jiffies vs ktime - there are functions
to convert between different units (see jiffies.h). Even if you are
concerned about reducing the resolution because of a conversion -
well, it isn't realistic to expect nano-sec resolution after restart...

You're right, there are functions to convert from relative jiffies to
relative nanoseconds, but there really aren't good functions to convert
from absolute jiffies to absolute CLOCK_MONOTONIC time.

One can make a rough approximation, but its not a very precise method
with a number of complications (ie: 32bit jiffies wraps every ~50 days,
the INITIAL_JIFFIES offset has to be remembered, and the larger issue of
the fact that CLOCK_MONOTONIC is NTP corrected, while jiffies may or may
not be). So I'd strongly advise against trying to directly convert abs
jiffies values to abs CLOCK_MONOTONIC time.

... given this, to be able to compute the relative time where
jiffies are involved (e.g. the remaining lease time relative to
ktime_begin) we could also keep a ctx->jiffies_begin, calculate
the delta, and then convert to ktime_t delta.

Suka: if that works for you, can you please add this piece to
your patch ?

Thanks for the feedback !

Oren.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux