Assume that the metadata of cow on the disk is corrupted after init for some reason: old chunk-id new chunk-id 0 2 ... x ---> 0 y After starting merge, old chunk 0 will be updated twice, and old chunk x will not be updated. And dm-snap will print err log after merge new chunk 2 to old chunk 0: <3>[ 731.921642] (1)[4092:kworker/1:0]device-mapper: snapshots: Corruption detected: exception for block 0 is on disk but not in memory then set snap->merge_failed to true. If userspace use "sectors_allocated == metadata_sectors" to determine whether the merge is completed, there maybe the following race that makes the userspace unable to know merge fail event: kernel merge kworker userspace process merge_callback ->commit_merge get snapshot_status by ioctl ->remove_single_exception_chunk set merge_failed to true think merge has been completed, switch device to another target Could we call remove_single_exception_chunk first to solve this race? Signed-off-by: Jiazi.Li <jiazi.li@xxxxxxxxxxxxx> --- drivers/md/dm-snap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index f766c21408f1..f658d05752f2 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1141,15 +1141,15 @@ static void merge_callback(int read_err, unsigned long write_err, void *context) goto shut; } + if (remove_single_exception_chunk(s) < 0) + goto shut; + if (s->store->type->commit_merge(s->store, s->num_merging_chunks) < 0) { DMERR("Write error in exception store: shutting down merge"); goto shut; } - if (remove_single_exception_chunk(s) < 0) - goto shut; - snapshot_merge_next_chunks(s); return; -- 2.17.1 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel