Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/28/21 3:35 PM, J. Bruce Fields wrote:
I'm still stuck trying to understand why subvolumes can't get their own
superblocks:

	- Why are the performance issues Josef raises unsurmountable?
	  And why are they unique to btrfs?  (Surely there other cases
	  where people need hundreds or thousands of superblocks?)


I don't think anybody has that many file systems. For btrfs it's a single file system. Think of syncfs, it's going to walk through all of the super blocks on the system calling ->sync_fs on each subvol superblock. Now this isn't a huge deal, we could just have some flag that says "I'm not real" or even just have anonymous superblocks that don't get added to the global super_blocks list, and that would address my main pain points.

The second part is inode reclaim. Again this particular problem could be avoided if we had an anonymous superblock that wasn't actually used, but the inode lru is per superblock. Now with reclaim instead of walking all the inodes, you're walking a bunch of super blocks and then walking the list of inodes within those super blocks. You're burning CPU cycles because now instead of getting big chunks of inodes to dispose, it's spread out across many super blocks.

The other weird thing is the way we apply pressure to shrinker systems. We essentially say "try to evict X objects from your list", which means in this case with lots of subvolumes we'd be evicting waaaaay more inodes than you were before, likely impacting performance where you have workloads that have lots of files open across many subvolumes (which is what FB does with it's containers).

If we want a anonymous superblock per subvolume then the only way it'll work is if it's not actually tied into anything, and we still use the primary super block for the whole file system. And if that's what we're going to do what's the point of the super block exactly? This approach that Neil's come up with seems like a reasonable solution to me. Christoph gets his separation and /proc/self/mountinfo, and we avoid the scalability headache of a billion super blocks. Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux