Re: [PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 06 Sep 2021, J. Bruce Fields wrote:
> On Thu, Sep 02, 2021 at 02:14:17PM +1000, NeilBrown wrote:
> > On Thu, 02 Sep 2021, J. Bruce Fields wrote:
> > > I looked back through a couple threads to try to understand why we
> > > couldn't do that (on new filesystems, with a mkfs option to choose new
> > > or old behavior) and still don't understand.  But the threads are long.
> > > 
> > > There are objections to a new mount option (which seem obviously wrong;
> > > this should be a persistent feature of the on-disk filesystem).
> > 
> > I hadn't thought much (if at all) about a persistent filesystem feature
> > flag.  I'll try that now.
> > 
> > There are two features of interest.  One is completely unique inode
> > numbers, the other is reporting different st_dev for different
> > subvolumes.  I think these need to be kept separate, though the second
> > would depend on the first.  They would be similar to my "inumbits" and
> > "numdevs" mount options, though with less flexibility.  I think that
> > they would need strong semantics to be acceptable - "mostly unique"
> > isn't really acceptable once we are changing the on-disk data.
> 
> I don't quite follow that.

I agree it is a bit of a leap.

> 
> Also the "on-disk data" here is literally just one more flag bit in some
> superblock field, right?

Maybe.  I *could* be just one bit.
But even "just one bit" is, I think, more of a support commitement than
adding a mount option.
Mount options are fairly obvious to the user.  super-blocks not as much.
So "just one bit" might still be "one more question" than the supoort
people need to ask when handling a problem report.

When I wrote that I was thinking about how I would be comfortable with
if I were a btrfs maintainer.  And I don't think I'd like to spend and
on-disk change and only gain a "mostly harmless" solution.

Christoph's comment about possible vulnerabilities are probably part of
this.  I think that over NFS, concern about a user being able to
synthesise an inode number conflict is probably "mountain out of mole
hill" territory.  However for local access, I cannot convince myself
that it won't be a problem.  I can imagine (incautiously written)
auditing scans getting confused, and while auditing over NFS doesn't
make much sense, auditing locally does.

> 
> > I believe that some code *knows* that the root of any btrfs subvolumes
> > has inode number 256.  systemd seems to use this.  I have no idea what
> > else might depend on inode numbers in some way.
> 
> Looking.  Ugh, yes, there's  abtrfs_might_be_subvol that takes a struct
> stat and returns:
> 
>         return S_ISDIR(st->st_mode) && st->st_ino == 256;
> 
> I wonder why it does that?  Are there situations where all it has is a
> file descriptor (so it can't easily compare st_dev with the parent?)
> And if you NFS-export and wanted to answer the same question on the
> client side, I wonder what you'd do.

There are also a few references to BTRFS_FIRST_FREE_OBJECTID which is
256.

Uses seem to include:
 - managing quotas, which fits with my idea that subvols are like
   project-quota trees.
 - optionally preventing "rm -r" from removing subvols
 - some switching to/from "readonly" which I cannot follow
 - some special handling of user home-directories when they are
   subvols

These are probably reasonable and do point to subvols being a little bit
like separate filesytems.  These would break if we changed local inode
numbers. 

The project-quota management and the read-only setting are not available
via NFS so changing the inode number seen that way is not likely to
matter as much.  In any case, detecting "256" is only useful if you can
also detect "is btrfs", and you cannot do that of NFS.

Once upon a time ext[234] had a set of inode flags and xfs separately
had a bunch of inode flags.  These are now unified (to a degree) in
'struct fsattr' accessed by FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.

btrfs supports that interface, but doesn't appear to have extended it
for subvol-specific things - preferring to create btrfs-specific ioctls
instead.   Maybe they weren't designed to be extensible enough.

Maybe what we really need is for a bunch of diverse filesystem
developers to get together and agree on some new common interface for
subvolume management, including coming up with some sort of definition
of what a subvolume "is".

Until that happens (and the new interfaces are implemented and widely
used) I can only see two possible solutions to the current
NFS-export-of-btrfs problem:

1/ change nfsd to export a different inode number to the one btrfs uses
   (or maybe a different fsid, but that is problematic in other ways)
2/ change userspace to check filehandles and not assume two things are
   the same if their filehandles are different.

Maybe I should write a patch for fts_read() and see how much glibc folk
will hate it.

NeilBrown



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux