On Mon, Oct 14, 2024 at 01:27:57PM +0000, Hans Holmberg wrote: > On 14/10/2024 10:33, Christoph Hellwig wrote: > > On Thu, Oct 10, 2024 at 06:04:17PM -0700, Darrick J. Wong wrote: > >> From: Darrick J. Wong <djwong@xxxxxxxxxx> > >> + __u32 rg_number; /* i/o: rtgroup number */ > >> + __u32 rg_length; /* o: length in blocks */ > >> + __u32 rg_capacity; /* o: usable capacity in blocks */ > > > > So the separate length vs capacity reporting was needed for my previous > > implementation of zoned devices with LBA gaps. Now that RT groups > > always use segmented addressing we shouldn't need it any more. > > > > That being said Hans was looking into using the capacity field to > > optimize data placement in power users like RockѕDB, and one thing > > that might be useful for that is to exclude known fixed metadata from > > the capacity field, which really is just the rtsb on rtgroup 0. > > > > Yeah, it would be very useful for apps to know the available user capacity > so that file sizes could be set up to align with that. > > When files are mapped to disjoint sets of realtime groups we can avoid garbage > collection all together. Even if the apps can't align file sizes perfectly to > the number of user writable blocks, write amplification can be minimized > by aiming for it. Hmmm so if I'm understanding you correctly: you want to define "capacity" to mean "maximum number of blocks available to userspace"? Does that available block count depend on privilege level (ala ext4 which always hides 5% of the blocks for root)? I think the answer to that is 'no' because you're really just reporting the number of LBAs in that zone that are available to /any/ application program, and there's a direct mapping from 'available LBAs in a zone' to 'rgblocks available in a rtgroup'. But yeah, I agree that it might be nice to know total blocks available in a particular rtgroup. Is it useful to track and report the number of unwritten blocks remaining in that group? For example, if the rtgroup size is 1024 fsblocks, the zns zone actually only has 8000 lba == 1000 fsblocks, and you've already written to 200 fsblocks of it, then we'd report: rg_length: 1024 rg_capacity: 1000 rg_avail: 800 Here the program knows that every 1000*4k bytes it writes will result in a jump to a new rtgroup; and that the next time this will happen is after it writes 800*4k bytes more? (Assume the usual frictionless system with no other writers :P) --D