On 7/30/21 11:17 AM, J. Bruce Fields wrote:
On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
OK, forgot it's an opt-in feature, then it's less an impact.
But it can still sometimes be problematic.
E.g. if the user want to put some git code into one subvolume, while
export another subvolume through NFS.
Then the user has to opt-in, affecting the git subvolume to lose the
ability to determine subvolume boundary, right?
Totally naive question: is it be possible to treat different subvolumes
differently, and give the user some choice at subvolume creation time
how this new boundary should behave?
It seems like there are some conflicting priorities that can only be
resolved by someone who knows the intended use case.
This is the crux of the problem. We have no real interfaces or anything to deal
with this sort of paradigm. We do the st_dev thing because that's the most
common way that tools like find or rsync use to determine they've wandered into
a "different" volume. This exists specifically because of usescases like
Zygo's, where he's taking thousands of snapshots and manually excluding them
from find/rsync is just not reasonable.
We have no good way to give the user information about what's going on, we just
have these old shitty interfaces. I asked our guys about filling up
/proc/self/mountinfo with our subvolumes and they had a heart attack because we
have around 2-4k subvolumes on machines, and with monitoring stuff in place we
regularly read /proc/self/mountinfo to determine what's mounted and such.
And then there's NFS which needs to know that it's walked into a new inode space.
This is all super shitty, and mostly exists because we don't have a good way to
expose to the user wtf is going on.
Personally I would be ok with simply disallowing NFS to wander into subvolumes
from an exported fs. If you want to export subvolumes then export them
individually, otherwise if you walk into a subvolume from NFS you simply get an
empty directory.
This doesn't solve the mountinfo problem where a user may want to figure out
which subvol they're in, but this is where I think we could address the issue
with better interfaces. Or perhaps Neil's idea to have a common major number
with a different minor number for every subvol.
Either way this isn't as simple as shoehorning it into automount and being done
with it, we need to take a step back and think about how should this actually
look, taking into account we've got 12 years of having Btrfs deployed with
existing usecases that expect a certain behavior. Thanks,
Josef