Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export

"NeilBrown" <neilb@xxxxxxx> · Fri, 20 Aug 2021 12:54:17 +1000

On Thu, 19 Aug 2021, Zygo Blaxell wrote:
> On Thu, Aug 19, 2021 at 07:46:22AM +1000, NeilBrown wrote:
> > 
> > Remember what the goal is.  Most apps don't care at all about duplicate
> > inode numbers - only a few do, and they only care about a few inodes.
> > The only bug I actually have a report of is caused by a directory having
> > the same inode as an ancestor.  i.e.  in lots of cases, duplicate inode
> > numbers won't be noticed.
> 
> rsync -H and cpio's hardlink detection can be badly confused.  They will
> think distinct files with the same inode number are hardlinks.  This could
> be bad if you were making backups (though if you're making backups over
> NFS, you are probably doing something that could be done better in a
> different way).

Yes, they could get confused.  inode numbers remain unique within a
"subvolume" so you would need to do at backup of multiple subtrees to
hit a problem.  Certainly possible, but probably less common.

> 
> 40 bit inodes would take about 20 years to collide with 24-bit subvols--if
> you are creating an average of 1742 inodes every second.  Also at the
> same time you have to be creating a subvol every 37 seconds to occupy
> the colliding 25th bit of the subvol ID.  Only the highest inode number
> in any subvol counts--if your inode creation is spread out over several
> different subvols, you'll need to make inodes even faster.
> 
> For reference, my high scores are 17 inodes per second and a subvol
> every 595 seconds (averaged over 1 year).  Burst numbers are much higher,
> but one has to spend some time _reading_ the files now and then.
> 
> I've encountered other btrfs users with two orders of magnitude higher
> inode creation rates than mine.  They are barely squeaking under the
> 20-year line--or they would be, if they were creating snapshots 50 times
> faster than they do today.

I do like seeing concrete numbers, thanks.  How many of these inodes and
subvols remain undeleted?  Supposing inode numbers were reused, how many
bits might you need?

> > My preference would be for btrfs to start re-using old object-ids and
> > root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
> > total number of bits does not exceed 64.  Unfortunately the maintainers
> > seem reluctant to even consider this.
> 
> It was considered, implemented in 2011, and removed in 2020.  Rationale
> is in commit b547a88ea5776a8092f7f122ddc20d6720528782 "btrfs: start
> deprecation of mount option inode_cache".  It made file creation slower,
> and consumed disk space, iops, and memory to run.  Nobody used it.
> Newer on-disk data structure versions (free space tree, 2015) didn't
> bother implementing inode_cache's storage requirement.

Yes, I saw that.  Providing reliable functional certainly can impact
performance and consume disk-space.  That isn't an excuse for not doing
it. 
I suspect that carefully tuned code could result in typical creation
times being unchanged, and mean creation times suffering only a tiny
cost.  Using "max+1" when the creation rate is particularly high might
be a reasonable part of managing costs.
Storage cost need not be worse than the cost of tracking free blocks
on the device.

"Nobody used it" is odd.  It implies it would have to be explicitly
enabled, and all it would provide anyone is sane behaviour.  Who would
imagine that to be an optional extra.

NeilBrown