On 2021-07-30 17:48, Josef Bacik wrote:
On 7/30/21 11:17 AM, J. Bruce Fields wrote:
On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
OK, forgot it's an opt-in feature, then it's less an impact.
But it can still sometimes be problematic.
E.g. if the user want to put some git code into one subvolume, while
export another subvolume through NFS.
Then the user has to opt-in, affecting the git subvolume to lose the
ability to determine subvolume boundary, right?
Totally naive question: is it be possible to treat different subvolumes
differently, and give the user some choice at subvolume creation time
how this new boundary should behave?
It seems like there are some conflicting priorities that can only be
resolved by someone who knows the intended use case.
This is the crux of the problem. We have no real interfaces or anything
to deal with this sort of paradigm. We do the st_dev thing because
that's the most common way that tools like find or rsync use to
determine they've wandered into a "different" volume. This exists
specifically because of usescases like Zygo's, where he's taking
thousands of snapshots and manually excluding them from find/rsync is
just not reasonable.
We have no good way to give the user information about what's going on,
we just have these old shitty interfaces. I asked our guys about
filling up /proc/self/mountinfo with our subvolumes and they had a heart
attack because we have around 2-4k subvolumes on machines, and with
monitoring stuff in place we regularly read /proc/self/mountinfo to
determine what's mounted and such.
And then there's NFS which needs to know that it's walked into a new
inode space.
This is all super shitty, and mostly exists because we don't have a good
way to expose to the user wtf is going on.
Personally I would be ok with simply disallowing NFS to wander into
subvolumes from an exported fs. If you want to export subvolumes then
export them individually, otherwise if you walk into a subvolume from
NFS you simply get an empty directory.
This doesn't solve the mountinfo problem where a user may want to figure
out which subvol they're in, but this is where I think we could address
the issue with better interfaces. Or perhaps Neil's idea to have a
common major number with a different minor number for every subvol.
Either way this isn't as simple as shoehorning it into automount and
being done with it, we need to take a step back and think about how
should this actually look, taking into account we've got 12 years of
having Btrfs deployed with existing usecases that expect a certain
behavior. Thanks,
Josef
As a user and sysadmin I really appreciate the way Btrfs currently works.
We use hourly snapshots which are exposed over Samba as "Previous
Versions" to Windows users. This amounts to thousands of snapshots, all
user serviceable. A great feature!
In Samba world we have a mount option[1] called "noserverino" which lets
the client generate unique inode numbers, rather than using the server
provided inode numbers. This allows Linux clients to work well against
servers exposing subvolumes and snapshots.
NFS has really old roots and had to make choices that we don't really
have to make today. Can we not provide something similar to mount.cifs
that generate unique inode numbers for the clients. This could be either
an nfsd export option (such as /mnt/foo *(rw,uniq_inodes)) or a mount
option on the clients.
One worry I have with making subvolumes automountpoints is that it might
affect the possibility to cp --reflink across that boundary.
[1] https://www.samba.org/~ab/output/htmldocs/manpages-3/mount.cifs.8.html