On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote: > I don't think anybody has that many file systems. For btrfs it's a single > file system. Think of syncfs, it's going to walk through all of the super > blocks on the system calling ->sync_fs on each subvol superblock. Now this > isn't a huge deal, we could just have some flag that says "I'm not real" or > even just have anonymous superblocks that don't get added to the global > super_blocks list, and that would address my main pain points. Umm... Aren't the snapshots read-only by definition? > The second part is inode reclaim. Again this particular problem could be > avoided if we had an anonymous superblock that wasn't actually used, but the > inode lru is per superblock. Now with reclaim instead of walking all the > inodes, you're walking a bunch of super blocks and then walking the list of > inodes within those super blocks. You're burning CPU cycles because now > instead of getting big chunks of inodes to dispose, it's spread out across > many super blocks. > > The other weird thing is the way we apply pressure to shrinker systems. We > essentially say "try to evict X objects from your list", which means in this > case with lots of subvolumes we'd be evicting waaaaay more inodes than you > were before, likely impacting performance where you have workloads that have > lots of files open across many subvolumes (which is what FB does with it's > containers). > > If we want a anonymous superblock per subvolume then the only way it'll work > is if it's not actually tied into anything, and we still use the primary > super block for the whole file system. And if that's what we're going to do > what's the point of the super block exactly? This approach that Neil's come > up with seems like a reasonable solution to me. Christoph gets his > separation and /proc/self/mountinfo, and we avoid the scalability headache > of a billion super blocks. Thanks, AFAICS, we also get arseloads of weird corner cases - in particular, Neil's suggestions re visibility in /proc/mounts look rather arbitrary. Al, really disliking the entire series...