On Thu, May 25, 2023 at 6:36 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, May 25, 2023 at 03:47:21PM -0700, Sarthak Kukreti wrote: > > On Thu, May 25, 2023 at 9:00 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > On Thu, May 25 2023 at 7:39P -0400, > > > Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Wed, May 24, 2023 at 04:02:49PM -0400, Mike Snitzer wrote: > > > > > On Tue, May 23 2023 at 8:40P -0400, > > > > > Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > It's worth noting that XFS already has a coarse-grained > > > > > > implementation of preferred regions for metadata storage. It will > > > > > > currently not use those metadata-preferred regions for user data > > > > > > unless all the remaining user data space is full. Hence I'm pretty > > > > > > sure that a pre-provisioning enhancment like this can be done > > > > > > entirely in-memory without requiring any new on-disk state to be > > > > > > added. > > > > > > > > > > > > Sure, if we crash and remount, then we might chose a different LBA > > > > > > region for pre-provisioning. But that's not really a huge deal as we > > > > > > could also run an internal background post-mount fstrim operation to > > > > > > remove any unused pre-provisioning that was left over from when the > > > > > > system went down. > > > > > > > > > > This would be the FITRIM with extension you mention below? Which is a > > > > > filesystem interface detail? > > > > > > > > No. We might reuse some of the internal infrastructure we use to > > > > implement FITRIM, but that's about it. It's just something kinda > > > > like FITRIM but with different constraints determined by the > > > > filesystem rather than the user... > > > > > > > > As it is, I'm not sure we'd even need it - a preiodic userspace > > > > FITRIM would acheive the same result, so leaked provisioned spaces > > > > would get cleaned up eventually without the filesystem having to do > > > > anything specific... > > > > > > > > > So dm-thinp would _not_ need to have new > > > > > state that tracks "provisioned but unused" block? > > > > > > > > No idea - that's your domain. :) > > > > > > > > dm-snapshot, for certain, will need to track provisioned regions > > > > because it has to guarantee that overwrites to provisioned space in > > > > the origin device will always succeed. Hence it needs to know how > > > > much space breaking sharing in provisioned regions after a snapshot > > > > has been taken with be required... > > > > > > dm-thinp offers its own much more scalable snapshot support (doesn't > > > use old dm-snapshot N-way copyout target). > > > > > > dm-snapshot isn't going to be modified to support this level of > > > hardening (dm-snapshot is basically in "maintenance only" now). > > Ah, of course. Sorry for the confusion, I was kinda using > dm-snapshot as shorthand for "dm-thinp + snapshots". > > > > But I understand your meaning: what you said is 100% applicable to > > > dm-thinp's snapshot implementation and needs to be accounted for in > > > thinp's metadata (inherent 'provisioned' flag). > > *nod* > > > A bit orthogonal: would dm-thinp need to differentiate between > > user-triggered provision requests (eg. from fallocate()) vs > > fs-triggered requests? > > Why? How is the guarantee the block device has to provide to > provisioned areas different for user vs filesystem internal > provisioned space? > After thinking this through, I stand corrected. I was primarily concerned with how this would balloon thin snapshot sizes if users potentially provision a large chunk of the filesystem but that's putting the cart way before the horse. Best Sarthak > > I would lean towards user provisioned areas not > > getting dedup'd on snapshot creation, > > <twitch> > > Snapshotting is a clone operation, not a dedupe operation. > > Yes, the end result of both is that you have a block shared between > multiple indexes that needs COW on the next overwrite, but the two > operations that get to that point are very different... > > </pedantic mode disegaged> > > > but that would entail tracking > > the state of the original request and possibly a provision request > > flag (REQ_PROVISION_DEDUP_ON_SNAPSHOT) or an inverse flag > > (REQ_PROVISION_NODEDUP). Possibly too convoluted... > > Let's not try to add everyone's favourite pony to this interface > before we've even got it off the ground. > > It's the simple precision of the API, the lack of cross-layer > communication requirements and the ability to implement and optimise > the independent layers independently that makes this a very > appealing solution. > > We need to start with getting the simple stuff working and prove the > concept. Then once we can observe the behaviour of a working system > we can start working on optimising individual layers for efficiency > and performance.... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx