Re: i_version changes

Peter Staubach <staubach@xxxxxxxxxx> · Thu, 14 Feb 2008 09:34:21 -0500

NeilBrown wrote:
On Thu, February 14, 2008 8:32 am, Peter Staubach wrote:

I don't think that this is quite true.  If the file is changed
when the NFS server is not running, then the value of i_version
which is used when the NFS server starts up again must be
different than the value which was previously used when the NFS
server was previously running.

As I said, the "NFS has seen this i_version" flag needs to be on
stable storage, e.g. the lsb of the i_version.  This will ensure that
any change after NFSD saw the i_version will cause the i_version to
be updated.
So I think it can provide correct semantics.
Precise details:
  NFSD: when reading i_version
      take lock
      tmp = i_version
      i_version |= 1
      drop lock
      return tmp & ~1;

  VFS when making any change:
      take lock
      if (i_version & 1) {
           i_version++;
           changed=1
      }
      drop lock
      if changed, sync inode

Yes, this does seem like it would do the job.  It could perhaps
be optimized somewhat to avoid lock contention, but I do think
that this would suffice.

Is the perceived performance hit really going to be as large
as suspected?  We already update the time fields fairly often
and we don't pay a huge penalty for those, or at least not a
penalty that we aren't willing to pay.  Has anyone measured
the cost?

Correct NFS semantics require that the i_version be written to disk
before (or when) the change is committed.  That means lots more inodes
in the journal.
If you are already doing data=journal, it the hit probably isn't too
high.(?)

Correct NFS semantics also require that any modified metadata,
including file times and file size, also be written to stable
storage.  Isn't this just another piece of modified metadata
that would go hand-in-hand with updated file times?

We should also require that the file mtime change when the
contents of the file are modified.  This should happen whether
or not the clock has ticked.  Unfortunately, to implement this,
we would need file time resolutions which are smaller granularity
than the system clock.  We could probably get away with nano-
second resolutions in the file system.

   Thanx...

      ps

You are right:  measuring the cost is important.  However as we are
designing a generic filesystem interface, we need to understand the
cost on multiple filesystems in a variety of configuration .... or
give the filesystem complete information and let it decide the optimal
implementation.

Giving the filesystem full information means having an inode_operation
"nfsd_reads_version" which returns the number to be used as change_id.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html