Re: [PATCH v6 23/28] btrfs: support dev-replace in HMZONED mode

Josef Bacik <josef@xxxxxxxxxxxxxx> · Tue, 17 Dec 2019 16:05:25 -0500

On 12/12/19 11:09 PM, Naohiro Aota wrote:
We have two type of I/Os during the device-replace process. One is a I/O to
"copy" (by the scrub functions) all the device extents on the source device
to the destination device.  The other one is a I/O to "clone" (by
handle_ops_on_dev_replace()) new incoming write I/Os from users to the
source device into the target device.

Cloning incoming I/Os can break the sequential write rule in the target
device. When write is mapped in the middle of a block group, that I/O is
directed in the middle of a zone of target device, which breaks the
sequential write rule.

However, the cloning function cannot be simply disabled since incoming I/Os
targeting already copied device extents must be cloned so that the I/O is
executed on the target device.

We cannot use dev_replace->cursor_{left,right} to determine whether bio is
going to not yet copied region.  Since we have time gap between finishing
btrfs_scrub_dev() and rewriting the mapping tree in
btrfs_dev_replace_finishing(), we can have newly allocated device extent
which is never cloned nor copied.

So the point is to copy only already existing device extents. This patch
introduces mark_block_group_to_copy() to mark existing block group as a
target of copying. Then, handle_ops_on_dev_replace() and dev-replace can
check the flag to do their job.

Device-replace process in HMZONED mode must copy or clone all the extents
in the source device exctly once.  So, we need to use to ensure allocations
started just before the dev-replace process to have their corresponding
extent information in the B-trees. finish_extent_writes_for_hmzoned()
implements that functionality, which basically is the removed code in the
commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after
error during device replace").

This patch also handles empty region between used extents. Since
dev-replace is smart to copy only used extents on source device, we have to
fill the gap to honor the sequential write rule in the target device.

Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>

Can you split up the copying part and the cloning part into different patches, 
this is a bear to review.  Also I don't quite understand the zeroout behavior. 
It _looks_ like for cloning you are doing a zeroout for the gap between the last 
wp position and the current cloned bio, which makes sense, but doesn't this gap 
exist because copying is ongoing?  Can you copy into a zero'ed out position?  Or 
am I missing something here?  Thanks,

Josef