On Tue, Jun 23, 2015 at 12:32:41PM -0400, Theodore Ts'o wrote: > On Thu, Jun 18, 2015 at 04:38:56PM +0200, David Sterba wrote: > > Moving the discussion to fsdevel. > > > > Summary: disabling MS_I_VERSION brings some speedups to btrfs, but the > > generic 'noiversion' option cannot be used to achieve that. It is > > processed before it reaches btrfs superblock callback, where > > MS_I_VERSION is forced. > > > > The proposed fix is to add btrfs-specific i_version/noi_version to btrfs, > > to which I object. > > I was talking to Mingming about this on today's ext4 conference call, > and one of the reasons why ext4 turns off i_version update by default > is because it does a real number on our performance as well --- and > furthermore, the only real user of the field from what we can tell is > NFSv4, which not all that many ext4 users actually care about. > > This has caused pain for the nfsv4 folks since it means that they need > to tell people to use a special mount option for ext4 if they are > actually using this for nfsv4, and I suspect they won't be all that > eager to hear that btrfs is going to go the same way. > > This however got us thinking --- even in if NFSv4 is depending on > i_version, it doesn't actually _look_ at that field all that often. > It's only going to look at it in a response to a client's getattr > call, and that in turn is used to so the client can do its local disk > cache invalidation if anby of the data blocks of the inode has changed. > > So what if we have a per-inode flag which "don't update I_VERSION", > which is off by default, but after the i_version has been updated at > least once, is set, so the i_version field won't be updated again --- > at least until something has actually looked at the i_version field, > when the "don't update I_VERSOIN" flag will get cleared again. > > So basically, if we know there are no microphones in the forest, we > don't need to make the tree fall. However, if someone has sampled the > i_version field, then the next time the inode gets updated, we will > update the i_version field so the NFSv4 client can hear the sound of > the tree crashing to the forst floor and so it can invalidate its > local cache of the file. :-) > > This should significantly improve the performance of using the > i_version field if the file system is being exported via NFSv4, and if > NFSv4 is not in use, no one will be looking at the i_version field, so > the performance impact will be very slight, and thus we could enable > i_version updates by default for btrfs and ext4. > > And this should make the distribution folks happy, since it will unify > the behavior of all file systems, and make life easier for users who > won't need to set certain magic mount options depending on what file > system they are using and whether they are using NFSv4 or not. > > Does this sound reasonable? I agree, this's a promising way to fix the whole thing. Regarding to client's getattr, I found that inode->i_version is not read by calling generic_fillattr(), so I'm a bit missing on how we get to change the flag... Thanks, -liubo > > Cheers, > > - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html