Re: i_version changes

"NeilBrown" <neilb@xxxxxxx> · Thu, 14 Feb 2008 09:06:44 +1100 (EST)

On Thu, February 14, 2008 8:32 am, Peter Staubach wrote:
>
> I don't think that this is quite true.  If the file is changed
> when the NFS server is not running, then the value of i_version
> which is used when the NFS server starts up again must be
> different than the value which was previously used when the NFS
> server was previously running.

As I said, the "NFS has seen this i_version" flag needs to be on
stable storage, e.g. the lsb of the i_version.  This will ensure that
any change after NFSD saw the i_version will cause the i_version to
be updated.
So I think it can provide correct semantics.
Precise details:
  NFSD: when reading i_version
      take lock
      tmp = i_version
      i_version |= 1
      drop lock
      return tmp & ~1;

  VFS when making any change:
      take lock
      if (i_version & 1) {
           i_version++;
           changed=1
      }
      drop lock
      if changed, sync inode

>
> Is the perceived performance hit really going to be as large
> as suspected?  We already update the time fields fairly often
> and we don't pay a huge penalty for those, or at least not a
> penalty that we aren't willing to pay.  Has anyone measured
> the cost?

Correct NFS semantics require that the i_version be written to disk
before (or when) the change is committed.  That means lots more inodes
in the journal.
If you are already doing data=journal, it the hit probably isn't too
high.(?)

You are right:  measuring the cost is important.  However as we are
designing a generic filesystem interface, we need to understand the
cost on multiple filesystems in a variety of configuration .... or
give the filesystem complete information and let it decide the optimal
implementation.

Giving the filesystem full information means having an inode_operation
"nfsd_reads_version" which returns the number to be used as change_id.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html