Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly

Josef Bacik <josef@xxxxxxxxxxxxxx> · Fri, 30 Jul 2021 11:48:15 -0400

On 7/30/21 11:17 AM, J. Bruce Fields wrote:
On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
OK, forgot it's an opt-in feature, then it's less an impact.

But it can still sometimes be problematic.

E.g. if the user want to put some git code into one subvolume, while
export another subvolume through NFS.

Then the user has to opt-in, affecting the git subvolume to lose the
ability to determine subvolume boundary, right?

Totally naive question: is it be possible to treat different subvolumes
differently, and give the user some choice at subvolume creation time
how this new boundary should behave?

It seems like there are some conflicting priorities that can only be
resolved by someone who knows the intended use case.

This is the crux of the problem.  We have no real interfaces or anything to deal 
with this sort of paradigm.  We do the st_dev thing because that's the most 
common way that tools like find or rsync use to determine they've wandered into 
a "different" volume.  This exists specifically because of usescases like 
Zygo's, where he's taking thousands of snapshots and manually excluding them 
from find/rsync is just not reasonable.

We have no good way to give the user information about what's going on, we just 
have these old shitty interfaces.  I asked our guys about filling up 
/proc/self/mountinfo with our subvolumes and they had a heart attack because we 
have around 2-4k subvolumes on machines, and with monitoring stuff in place we 
regularly read /proc/self/mountinfo to determine what's mounted and such.

And then there's NFS which needs to know that it's walked into a new inode space.

This is all super shitty, and mostly exists because we don't have a good way to 
expose to the user wtf is going on.

Personally I would be ok with simply disallowing NFS to wander into subvolumes 
from an exported fs.  If you want to export subvolumes then export them 
individually, otherwise if you walk into a subvolume from NFS you simply get an 
empty directory.

This doesn't solve the mountinfo problem where a user may want to figure out 
which subvol they're in, but this is where I think we could address the issue 
with better interfaces.  Or perhaps Neil's idea to have a common major number 
with a different minor number for every subvol.

Either way this isn't as simple as shoehorning it into automount and being done 
with it, we need to take a step back and think about how should this actually 
look, taking into account we've got 12 years of having Btrfs deployed with 
existing usecases that expect a certain behavior.  Thanks,

Josef