On Fri, May 26, 2023 at 12:04:02PM +0100, Joe Thornber wrote: > Here's my take: > > I don't see why the filesystem cares if thinp is doing a reservation or > provisioning under the hood. All that matters is that a future write > to that region will be honoured (barring device failure etc.). > > I agree that the reservation/force mapped status needs to be inherited > by snapshots. > > > One of the few strengths of thinp is the performance of taking a snapshot. > Most snapshots created are never activated. Many other snapshots are > only alive for a brief period, and used read-only. eg, blk-archive > (https://github.com/jthornber/blk-archive) uses snapshots to do very > fast incremental backups. As such I'm strongly against any scheme that > requires provisioning as part of the snapshot operation. > > Hank and I are in the middle of the range tree work which requires a > metadata > change. So now is a convenient time to piggyback other metadata changes to > support reservations. > > > Given the above this is what I suggest: > > 1) We have an api (ioctl, bio flag, whatever) that lets you > reserve/guarantee a region: > > int reserve_region(dev, sector_t begin, sector_t end); A C-based interface is not sufficient because the layer that must do provsioning is not guaranteed to be directly under the filesystem. We must be able to propagate the request down to the layers that need to provision storage, and that includes hardware devices. e.g. dm-thin would have to issue REQ_PROVISION on the LBA ranges it allocates in it's backing device to guarantee that the provisioned LBA range it allocates is also fully provisioned by the storage below it.... > This api should be used minimally, eg, critical FS metadata only. Keep in mind that "critical FS metadata" in this context is any metadata which could cause the filesystem to hang or enter a global error state if an unexpected ENOSPC error occurs during a metadata write IO. Which, in pretty much every journalling filesystem, equates to all metadata in the filesystem. For a typical root filesystem, that might be a in the range of a 1-200MB (depending on journal size). For larger filesytems with lots of files in them, it will be in the range of GBs of space. Plan for having to support tens of GBs of provisioned space in filesystems, not tens of MBs.... [snip] > Now this is a lot of work. As well as the kernel changes we'll need to > update the userland tools: thin_check, thin_ls, thin_metadata_unpack, > thin_rmap, thin_delta, thin_metadata_pack, thin_repair, thin_trim, > thin_dump, thin_metadata_size, thin_restore. Are we confident that we > have buy in from the FS teams that this will be widely adopted? Are users > asking for this? I really don't want to do 6 months of work for nothing. I think there's a 2-3 solid days of coding to fully implement REQ_PROVISION support in XFS, including userspace tool support. Maybe a couple of weeks more to flush the bugs out before it's largely ready to go. So if there's buy in from the block layer and DM people for REQ_PROVISION as described, then I'll definitely have XFS support ready for you to test whenever dm-thinp is ready to go. I can't speak for other filesystems, I suspect the only one we care about is ext4. btrfs and f2fs don't need dm-thinp and there aren't any other filesystems that are used in production on top of dm-thinp, so I think only XFS and ext4 matter at this point in time. I suspect that ext4 would be fairly easy to add support for as well. ext4 has a lot more fixed-place metadata than XFS has so much more of it's metadata is covered by mkfs-time provisioning. Limiting dynamic metadata to specific fully provisioned block groups and provisioning new block groups for metadata when they are near full would be equivalent to how I plan to provision metadata space in XFS. Hence the implementation for ext4 looks to be broadly similar in scope and complexity as XFS.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx