On Thu, Aug 09 2018 at 12:22pm -0400, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > Quoting Documentation/device-mapper/cache.txt: > > The 'dirty' state for a cache block changes far too frequently for us > to keep updating it on the fly. So we treat it as a hint. In normal > operation it will be written when the dm device is suspended. If the > system crashes all cache blocks will be assumed dirty when restarted. > > This got broken in commit f177940a8091 ("dm cache metadata: switch to > using the new cursor api for loading metadata") in 4.9, which removed > the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk > flag) when loading cache blocks. This results in data corruption on an > unclean shutdown with dirty cache blocks on the fast device. After the > crash those blocks are considired clean and may get evicted from the > cache at any time. This can be demonstrated by doing a lot of reads > to trigger individual evictions, but uncache is more predictable: > > ### Disable auto-activation in lvm.conf to be able to do uncache in > ### time (i.e. see uncache doing flushing) when the fix is applied. > > # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb > # vgcreate vg_cache /dev/vdb /dev/vdc > # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb > # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc > # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc > # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev > # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev > # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev > # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 > 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > # dmsetup status vg_cache-lv_slowdev > 0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw - > ^^^^ > 7065 * 64k = 441M yet to be written to the slow device > # echo b >/proc/sysrq-trigger > > # vgchange -ay vg_cache > # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 > 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > # lvconvert --uncache vg_cache/lv_slowdev > Flushing 0 blocks for cache vg_cache/lv_slowdev. > Logical volume "lv_cachedev" successfully removed > Logical volume vg_cache/lv_slowdev is not cached. > # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 > 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ > 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ > > This is the case with both v1 and v2 cache pool metatata formats. > > After applying this patch: > > # vgchange -ay vg_cache > # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 > 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > # lvconvert --uncache vg_cache/lv_slowdev > Flushing 3724 blocks for cache vg_cache/lv_slowdev. > ... > Flushing 71 blocks for cache vg_cache/lv_slowdev. > Logical volume "lv_cachedev" successfully removed > Logical volume vg_cache/lv_slowdev is not cached. > # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 > 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") > Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> I staged this earlier today for 4.19 inclusion, please see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.19&id=5b1fe7bec8a8d0cc547a22e7ddc2bd59acd67de4 Thanks so much for your thorough work on this. Very well done! Mike -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel