Re: rfc: [patch] change attribute for ext3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 14, 2006 at 03:23:18AM -0600, Andreas Dilger wrote:
> On Sep 13, 2006  20:30 +0200, Alexandre Ratchov wrote:
> > On Wed, Sep 13, 2006 at 02:11:11PM -0400, Trond Myklebust wrote:
> > > On Wed, 2006-09-13 at 18:42 +0200, Alexandre Ratchov wrote:
> > > > the change attribute is a simple counter that is reset to zero on
> > > > inode creation and that is incremented every time the inode data is
> > > > modified (similarly to the "ctime" time-stamp).
> > > 
> > > I would really have preferred a full-blown 64-bit counter as per
> > > RFC3530, but I suppose we could always combine this change attribute
> > > with the high word from ctime in order to make up the NFSv4 change
> > > attribute. That should keep us safe until someone develops a ramdisk
> > > with < 1 nsecond access time.
> > 
> > do you mean something like "(ctime.tv_sec << 32) | change_attribute"? this
> > would allow 2^32 inode changes per second.
> 
> It might be preferrable, since we are depending on the ctime here anyways,
> is to combine this with the nsec-resolution ctime, and kill two birds with
> one field in the inode.
> 
> The implementation would be to update the ctime+nsec field as normal, but
> in the unlikely case that both the second+nsec ctime is the same as before
> the nsec value would be incremented by 1.  This could happen in case of
> low-resolution kernel timers, and would also handle the future case where
> the inode is modified more than once in the same nanosecond.
> 
> The other benefit is that it allows comparisons between two different
> inodes to be more meaningful, instead of just using the seconds + random
> version number.
> 
> It would be possible/desirable to make the nsec ctime field be part of the 
> small inode (using the proposed reserved field) instead of the large inode,
> since that is a requirement for working with existing ext3 filesystems.  The
> previous nsec timestamp patch would only need trivial modifications to make
> this work, just #define i_ctime_extra to be l_i_reserved1 I believe.
> 

there is something i dislike with incrementing the nsec value. The ctime is
a global (as opposed to per-inode) time reference for the file-system. And
it is expected to be globally coherent; imagine the following situation:

Within the same time-slice (with time-stamp T0, in nanoseconds), we do the
following in this order:

change file1	-> 	ctime = T0
change file2	->	ctime = T0
change file2	->	ctime = T0 + 1
change file2	->	ctime = T0 + 2
change file1	->	ctime = T0 + 1

so it appears that file2 is strictly newer than file1, which is false. So
the assumption "if ctime(file1) < ctime(file2) then file2 is newer that
file1" is no longer true.

In order to fix this, we'll need to increment a global counter, not a
pre-inode counter. It's feasable.

cheers,

-- Alexandre
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux