Re: Question about XFS_MAXINUMBER

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 19 Mar 2018 10:02:59 +1100

On Sun, Mar 18, 2018 at 08:21:16AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 11:28 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Sat, Mar 17, 2018 at 09:56:19AM +0200, Amir Goldstein wrote:
> >> On Sat, Mar 17, 2018 at 7:40 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >> > On Fri, Mar 16, 2018 at 11:24 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> >> On Fri, Mar 16, 2018 at 04:05:22PM +0200, Amir Goldstein wrote:
> >> >>> Hi guys,
> >> >>>
> >> >>> I am trying to get a lower bound for unused inode number MSB on
> >> >>> a mounted xfs super block, so I can publish it on struct super_block.
> >> >>
> >> >> Sorry, what?
> >> >>
> >> >> The inode number is owned by the filesystem - nobody should be
> >> >> touching it or making assumptions they can screw with it in any way.
> >> >>
> >>
> >> Let me clarify with the simplest example:
> >>
> >> With overlay of 2 layers, lower and upper on 2 different xfs fs
> >> assuming that stat(2) from xfs will not be using the 63 MSB:
> >>
> >> On stat(2) of an overlay upper inode we want to return:
> >>   st_dev = <overlay anon bdev>
> >>   st_ino = <real upper st_ino>
> >>
> >> On stat(2) of an overlay lower inode we want to return:
> >>   st_dev = <overlay anon bdev>
> >>   st_ino = <real lower st_ino> | 1 << 63

[....]

> I should have mentioned that "foo" is a pure upper - a file that was created
> as upper and let's suppose the real ino of "foo" in upper fs is 10.
> And let's suppose that the real ino of "bar" on lower fs is also 10, which is
> possible when lower fs is a different fs than upper fs.

Ok, so to close the loop. The problem is that overlay has no inode
number space of it's own, nor does it have any persistent inode
number mapping scheme. Hence overlay has no way of providing users
with a consistent, unique {dev,ino #} tuple to userspace when it's
different directories lie on different filesystems.

[....]

> > across different superblocks? Why do you need special "inode number
> > bits" exposed to userspace to identify upper->lower inode
> > mappings that overlay should already have a persistent mapping
> > mechanism for?
> 
> Because real pure upper inode and lower inode can have the same
> inode number and we want to multiplex our way our of this collision.
> 
> Note that we do NOT maintain a data structure for looking up used
> lower/upper inode numbers, nor do we want to maintain a persistent
> data structure for persistent overlay inode numbers that map to
> real underlying inodes. AFAIK, aufs can use a small db for it's 'xino'
> feature. This is something that we wish to avoid.

SO instead of maintaining your own data structure to provide the
necessary guarantees, the solution is to steal bits from the
underlying filesystem inode numbers on the assumption they they will
never user them?

What happens when a user upgrades their kernel, the underlying fs
changes all it's inode numbers because it's done some virtual
mapping thing for, say, having different inode number ranges for
separate mount namespaces? And so instead of having N bits of free
inode number space before upgrade, it now has zero? How will overlay
react to this sort of change, given it could expose duplicate inode
numbers....

Quite frankly, I think this "steal bits from the underlying
filesystems" mechanism is a recipe for trouble. If you want play
these games, you get to keep all the broken bits when filesystems
change the number of available bits.

Given that overlay has a persistent inode numbering problem, why
doesn't overlay just allocate and store it's own inode numbers and
other required persistent state in an xattr? 

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html