Re: Question about XFS_MAXINUMBER

Dave Chinner <david@xxxxxxxxxxxxx> · Sun, 18 Mar 2018 08:28:14 +1100

On Sat, Mar 17, 2018 at 09:56:19AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 7:40 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > On Fri, Mar 16, 2018 at 11:24 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> On Fri, Mar 16, 2018 at 04:05:22PM +0200, Amir Goldstein wrote:
> >>> Hi guys,
> >>>
> >>> I am trying to get a lower bound for unused inode number MSB on
> >>> a mounted xfs super block, so I can publish it on struct super_block.
> >>
> >> Sorry, what?
> >>
> >> The inode number is owned by the filesystem - nobody should be
> >> touching it or making assumptions they can screw with it in any way.
> >>
> 
> Let me clarify with the simplest example:
> 
> With overlay of 2 layers, lower and upper on 2 different xfs fs
> assuming that stat(2) from xfs will not be using the 63 MSB:
> 
> On stat(2) of an overlay upper inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real upper st_ino>
> 
> On stat(2) of an overlay lower inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real lower st_ino> | 1 << 63
> 
> Now for ext4 this is always safe to do and we find that automatically
> due to the fact that ext4 uses the default encode_fh generic 32bit
> inode encoding.
> 
> For xfs this should also be safe, but we don't want to whitelist xfs
> by name/magic, so we want xfs to publish the max amount of bits
> exposed to user with stat(2)/getdents(3).
> 
> Recently, I became aware of an nfsd use case that also looks
> at inode->i_ino, so we may want to also be able to assume
> max_ino_bits also applies to inode->i_ino, but if you tell us to
> stay clear of inode->i_ino, then we can always use stat.st_ino.
> 
> Thanks,
> Amir.
> 

On Sat, Mar 17, 2018 at 10:24:39AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 10:04 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Sat, Mar 17, 2018 at 06:40:23AM +0100, Miklos Szeredi wrote:
> [...]
> >> I ask, because we've thought long and hard about what to do for
> >> multiplexing inum space in overlayfs, and found no other sane options.
> >> Ideas welcome, of course.
> >
> > Why do you need to "multiplex" the inum space? perhaps you'd do
> > better to start with a description of why you want to play games
> > with inode numbers, rather than just posting a patch to steal bits
> > from other filesytem inode number spaces....
> >
> 
> I think this patch perhaps explains best what we want to do:
> https://marc.info/?l=linux-unionfs&m=151007386219743&w=2
> 
> I had already given a simple example in an earlier response.

So, I'll quote that here:

> > > On stat(2) of an overlay upper inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real upper st_ino>
> > > 
> > > On stat(2) of an overlay lower inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real lower st_ino> | 1 << 63

This makes no sense to me - this implies the inode number changes on
copy-up, and ....

> As the the "why" question, we have several requirements for
> overlay inode numbers:
> 1. st_ino is persistent
> 2. st_ino/st_dev pair is unique in the system
> 3. st_ino is consistent with d_ino
> 4. st_ino doesn't change on copy up
> 5. st_dev is uniform across all overlay inodes

.... this means requierment #4 isn't met, even on the same
filesystem.

IOWs, if overlay has already met #4 on the same filesystem, then
there is a persistent mapping between lower and upper inodes (Req.
#1) that maps the upper inode # to the lower inode #. That has to be
overlay information, because the underlying filesystem doesn't store
it. And because the lower inode/dev is unique, then req. 2 is met,
too.

FWIW, req 5 is badly worded - st_dev is uniform across all inodes in
a single overlay filesystem, not all overlay inodes.

> With upstream overlayfs we meet all requirements above for
> the case of all underlying layers on the same fs, by using a real
> underlying inode st_ino and the overlay st_dev.

Yeah, that's what I thought. So why can't you do exactly the same
thing for different underlying filesystems? You've already got a
mapping between upper and lower inode numbers, why can't that map
across different superblocks? Why do you need special "inode number
bits" exposed to userspace to identify upper->lower inode
mappings that overlay should already have a persistent mapping
mechanism for?

> With the 'xino' patch set [1], we can meet all requirements above
> also for the case of underlying layers on different fs, by multiplpexing
> the inum space, as long as we know about unused high ino bits.

Your example makes no sense to me - I don't see how adding extra
bits to the lower inode number allows you to meet requirement #4,
not why presenting "st_ino = <real upper st_ino>" for inodes that
have been copied up iis being done because that violates requirement
#4....

> The ovl-xino branch already has the xfs patch (not yet posted) to publish
> max_ino_bits.

That has no explanation of why you need to screw with inode number
bits, either. It's all mechanism, and there's zero explanation of
what problem it solves.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html