Dne 21.6.2016 v 09:56 Dennis Yang napsal(a):
Hi, We have been dealing with a data corruption issue when we run out I/O test suite made by ourselves with multiple thin devices built on top of a thin-pool. In our test suites, we will create multiple thin devices and continually write to them, check the file checksum, and delete all files and issue DISCARD to reclaim space if no checksum error takes place. We found that there is one data access pattern could corrupt the data. Suppose that there are two thin devices A and B, and device A receives a DISCARD bio to discard a physical(pool) block 100. Device A will quiesce all previous I/O and held both virtual and physical data cell before it actually remove the corresponding data mapping. After the data mapping is removed, both data cell will be released and this DISCARD bio will be passed down to underlying devices. If device B tries to allocate a new block at the very same moment, it could reuse the block 100 which was just been discarded by device A (suppose metadata commit had been triggered, for a block cannot be reused in the same transaction). In this case, we will have a race between the WRITE bio coming from device B and the DISCARD bio coming from device A. Once the WRITE bio completes before the DISCARD bio, there would be checksum error for device B. So my question is, does dm-thin have any mechanism to eliminate the race when discarded block is reused right away by another device? Any help would be grateful. Thanks,
Please provide version of kernel and surrounding tools (OS release version)? also are you using 'lvm2' or you use directly 'dmsetup/ioctl' ? (in the later case we would need to see exact sequencing of operation). Also please provide reproducer script. Regards Zdenek -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel