Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export

"NeilBrown" <neilb@xxxxxxx> · Mon, 16 Aug 2021 08:17:30 +1000

On Mon, 16 Aug 2021, Roman Mamedov wrote:
> 
> I wondered a bit myself, what are the downsides of just doing the
> uniquefication inside Btrfs, not leaving that to NFSD?
> 
> I mean not even adding the extra stat field, just return the inode itself with
> that already applied. Surely cannot be any worse collision-wise, than
> different subvolumes straight up having the same inode numbers as right now?
> 
> Or is it a performance concern, always doing more work, for something which
> only NFSD has needed so far.

Any change in behaviour will have unexpected consequences.  I think the
btrfs maintainers perspective is they they don't want to change
behaviour if they don't have to (which is reasonable) and that currently
they don't have to (which probably means that users aren't complaining
loudly enough).

NFS export of BTRFS is already demonstrably broken and users are
complaining loudly enough that I can hear them ....  though I think it
has been broken like this for 10 years, do I wonder that I didn't hear
them before.

If something is perceived as broken, then a behaviour change that
appears to fix it is more easily accepted.

However, having said that I now see that my latest patch is not ideal.
It changes the inode numbers associated with filehandles of objects in
the non-root subvolume.  This will cause the Linux NFS client to treat
the object as 'stale' For most objects this is a transient annoyance.
Reopen the file or restart the process and all should be well again.
However if the inode number of the mount point changes, you will need to
unmount and remount.  That is more somewhat more of an annoyance.

There are a few ways to handle this more gracefully.

1/ We could get btrfs to hand out new filehandles as well as new inode
numbers, but still accept the old filehandles.  Then we could make the
inode number reported be based on the filehandle.  This would be nearly
seamless but rather clumsy to code.  I'm not *very* keen on this idea,
but it is worth keeping in mind.

2/ We could add a btrfs mount option to control whether the uniquifier
was set or not.  This would allow the sysadmin to choose when to manage
any breakage.  I think this is my preference, but Josef has declared an
aversion to mount options.

3/ We could add a module parameter to nfsd to control whether the
uniquifier is merged in.  This again gives the sysadmin control, and it
can be done despite any aversion from btrfs maintainers.  But I'd need
to overcome any aversion from the nfsd maintainers, and I don't know how
strong that would be yet. (A new export option isn't really appropriate.
It is much more work to add an export option than the add a mount option).

I don't know.... maybe I should try harder to like option 1, or at least
verify if it works as expected and see how ugly the code really is.

NeilBrown