On 3/28/23 19:20, Mike Snitzer wrote:
On Mon, Mar 27 2023 at 4:24P -0400,
Gwendal Grignou <gwendal@xxxxxxxxxxxx> wrote:
On ChromeOS, we are working on migrating file backed loopback devices
to thinpool logical volumes using dm-clone on the Chromebook local
SSD.
Dm-clone hydration workflow is a great fit but the design of dm-clone
assumes a read-only source device. Data present in the backing file
will be copied to the new logical volume but can be safely deleted
only when the hydration process is complete. During migration, some
data will be duplicated and usage on the Chromebook SSD will
unnecessarily increase.
Would it be reasonable to add a discard option when enabling the
hydration process to discard data as we go on the source device?
2 implementations are possible:
a- add a state to the hydration state machine to ensure a region is
discarded before considering another region.
b- a simpler implementation where the discard is sent asynchronously
at the end of a region copy. It may not complete successfully (in case
the device crashes during the hydration for instance), but will vastly
reduce the amount of data left in the source device at the end of the
hydration.
I prefer b) as it is easier to implement, but a) is cleaner from a
usage point of view.
In general, discards may not complete for any number of reasons. So
while a) gives you finer-grained potential for space being
deallocated, b) would likely suffice given that a device crash is
pretty unlikely (at least I would think). And in the case of file
backed loopback devices, independent of dm-clone, you can just issue
discard(s) to all free space after a crash?
However you elect to do it, you'd do well to make it an optional
"discard_rw_src" (or some better name) feature that is configured when
you load the dm-clone target.
I agree with Mike, but I also want to note the following.
dm-clone commits its on-disk metadata periodically every second, and
every time a FLUSH or FUA bio is written. This is done to improve
performance.
This means the dm-clone device behaves like a physical disk that has a
volatile write cache. If power is lost you may lose some recent writes,
_and_ dm-clone might need to rehydrate some regions.
So, you can't discard a region on the source device after the copy
operation has finished, because then the following scenario will result
in data corruption:
1. dm-clone hydrates a region
2. dm-clone discards the region on the source device, either
synchronously (a) or asynchronously (b)
3. The system crashes before the metadata is committed
4. The system comes up, and dm-clone rehydrates the region, because it
thinks it has not been hydrated yet
5. The source device might contain garbage for this region, since we
discarded it previously
6. You have data corruption
So, you can only discard hydrated regions for which the metadata have
been committed on disk.
I think you could discard hydrated regions on the source device
periodically, right after committing the metadata.
dm-clone keeps track of the regions hydrated during each metadata
transaction, so after committing the metadata for the current
transaction, you could also sent an asynchronous discard for these
regions.
Nikos.
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel