Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export

"NeilBrown" <neilb@xxxxxxx> · Thu, 19 Aug 2021 07:46:22 +1000

On Thu, 19 Aug 2021, Wang Yugui wrote:
> Hi,
> 
> We use  'swab64' to combinate 'subvol id' and 'inode' into 64bit in this
> patch.
> 
> case1:
> 'subvol id': 16bit => 64K, a little small because the subvol id is
> always increase?
> 'inode':	48bit * 4K per node, this is big enough.
> 
> case2:
> 'subvol id': 24bit => 16M,  this is big enough.
> 'inode':	40bit * 4K per node => 4 PB.  this is a little small?

I don't know what point you are trying to make with the above.

> 
> Is there a way to 'bit-swap' the subvol id, rather the current byte-swap?

Sure:
   for (i=0; i<64; i++) {
        new = (new << 1) | (old & 1)
        old >>= 1;
   }

but would it gain anything significant?

Remember what the goal is.  Most apps don't care at all about duplicate
inode numbers - only a few do, and they only care about a few inodes.
The only bug I actually have a report of is caused by a directory having
the same inode as an ancestor.  i.e.  in lots of cases, duplicate inode
numbers won't be noticed.

The behaviour of btrfs over NFS RELIABLY causes exactly this behaviour
of a directory having the same inode number as an ancestor.  The root of
a subtree will *always* do this.  If we JUST changed the inode numbers
of the roots of subtrees, then most observed problems would go away.  It
would change from "trivial to reproduce" to "rarely happens".  The patch
I actually propose makes it much more unlikely than that.  Even if
duplicate inode numbers do happen, the chance of them being noticed is
infinitesimal.  Given that, there is no point in minor tweaks unless
they can make duplicate inode numbers IMPOSSIBLE.

> 
> If not, maybe it is a better balance if we combinate 22bit subvol id and
> 42 bit inode?

This would be better except when it is worse.  We cannot know which will
happen more often.

As long as BTRFS allows object-ids and root-ids combined to use more
than 64 bits there can be no perfect solution.  There are many possible
solutions that will be close to perfect in practice.  swab64() is the
simplest that I could think of.  Picking any arbitrary cut-off (22/42,
24/40, ...) is unlikely to be better, and could is some circumstances be
worse.

My preference would be for btrfs to start re-using old object-ids and
root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
total number of bits does not exceed 64.  Unfortunately the maintainers
seem reluctant to even consider this.

NeilBrown