Re: [PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 1 Sep 2021 11:22:51 -0400

On Wed, Sep 01, 2021 at 08:20:06AM +0100, Christoph Hellwig wrote:
> On Tue, Aug 31, 2021 at 02:59:05PM +1000, NeilBrown wrote:
> > Making the change purely in btrfs is simply not possible.  There is no
> > way for btrfs to provide nfsd with a different inode number.  To move
> > the bulk of the change into btrfs code we would need - at the very least
> > - some way for nfsd to provide the filehandle when requesting stat
> > information.  We would also need to provide a reference filehandle when
> > requesting a dentry->filehandle conversion.  Cluttering the
> > export_operations like that just for btrfs doesn't seem like the right
> > balance.  I agree that cluttering kstat is not ideal, but it was a case
> > of choosing the minimum change for the maximum effect.
> 
> So you're papering over a btrfs bug by piling up cludges in the nsdd
> code that has not business even knowing about this btrfs bug, while
> leaving other users of inodes numbers and file handles broken?
> 
> If you only care about file handles:  this is what the export operations
> are for.  If you care about inode numbers:  well, it is up to btrfs
> to generate uniqueue inode numbers.  It currently doesn't do that, and
> no amount of papering over that in nfsd is going to fix the issue.
> 
> If XORing a little more entropy

It's stronger than "a little more entropy".  We know enough about how
the numbers being XOR'd grow to know that collisions are only going to
happen in some extreme use cases.  (If I understand correctly.)

> into the inode number is a good enough band aid (and I strongly
> disagree with that), do it inside btrfs for every place they report
> the inode number.  There is nothing NFS-specific about that.

Neil tried something like that:

	https://lore.kernel.org/linux-nfs/162761259105.21659.4838403432058511846@xxxxxxxxxxxxxxxxxxxxx/

	"The patch below, which is just a proof-of-concept, changes
	btrfs to report a uniform st_dev, and different (64bit) st_ino
	in different subvols."

(Though actually you're proposing keeping separate st_dev?)

I looked back through a couple threads to try to understand why we
couldn't do that (on new filesystems, with a mkfs option to choose new
or old behavior) and still don't understand.  But the threads are long.

There are objections to a new mount option (which seem obviously wrong;
this should be a persistent feature of the on-disk filesystem).

--b.