Re: blktests dm/002 always fails for me

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Mon, 24 Jun 2024 13:01:27 +0000

On Jun 24, 2024 / 04:46, hch@xxxxxxxxxxxxx wrote:
> On Mon, Jun 24, 2024 at 11:21:33AM +0000, Shinichiro Kawasaki wrote:
> > I took a look in the test script and dm-dust code again, and now I think the dd
> > command is expected to success. The added bad blocks have default wr_fail_cnt
> > value 0, then write error should not happen for the dd command. (Bryan, if this
> > understanding is wrong, please let me know.)
> > 
> > So the error log that Christoph observes indicates that the dd command failed,
> > and this failure is unexpected. I can not think of any cause of the failure.
> 
> Yes, it does indeed fail, this is 002.full with your patch:
> 
> dd: error writing '/dev/mapper/dust1': Invalid argument
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.000373943 s, 0.0 kB/s

Thank you for sharing the logs. The dd failed at the very first block.

> 
> > 
> > Christoph, may I ask you to share the kernel messages during the test run?
> > Also, I would like to check the dd command output. The one liner patch below
> > to the blktests will create resutls/vdb/dm/002.full with the dd output.
> 
> the relevant lines of dmesg output below:
> 
> [   57.773967] run blktests dm/002 at 2024-06-24 11:43:53
> [   57.791251] I/O error, dev vdb, sector 774 op 0x0:(READ) flags 0x80700 phys_seg 250 prio class 0
> [   57.791849] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x84700 phys_seg 254 prio class 0
> [   57.792420] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.792805] I/O error, dev vdb, sector 521 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793190] I/O error, dev vdb, sector 522 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793578] I/O error, dev vdb, sector 523 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793955] I/O error, dev vdb, sector 524 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.794318] I/O error, dev vdb, sector 525 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.794700] I/O error, dev vdb, sector 526 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.795130] I/O error, dev vdb, sector 527 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.795516] Buffer I/O error on dev dm-0, logical block 65, async page read
> [   57.800743] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
> [   57.802587] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
> [   57.804359] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
> [   57.811253] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
> [   57.813065] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
> [   57.814786] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
> [   57.818023] device-mapper: dust: enabling read failures on bad sectors
> [   57.826500] Buffer I/O error on dev dm-0, logical block 8, async page read

Hmm, there are many I/O errors for /dev/vdb and /dev/mapper/dust1. When the test
case succeeds, no I/O error is reported [2]. So, the next question is "why the
I/O errors happen?" My mere guess is that the TEST_DEV might have non-512 block
size, probably 4096. The test case dm/002 specifies 512 byte as the block size
for the /dev/mapper/dust1, then 4096 block size will cause I/O errors due to the
unaligned block sizes. I ran dm/002 with a null_blk with 4096 block size, and
observed very similar symptom as what Christoph observed.

Christoph, what is the output of "blockdev --getbsz /dev/vdb" on your system?
If it is not 512, could you try blktests side fix patch candidate [3]?


[2] Kernel message on success case:

[17024.868423] [ T160674] run blktests dm/002 at 2024-06-24 20:07:17
[17024.983252] [ T160697] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail c
ount 0
[17024.992010] [ T160698] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail c
ount 0
[17024.999706] [ T160699] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail c
ount 0
[17025.029553] [ T160704] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail c
ount 0
[17025.036658] [ T160705] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail c
ount 0
[17025.044143] [ T160706] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail c
ount 0
[17025.058206] [ T160708] device-mapper: dust: enabling read failures on bad sectors
[17025.067598] [ T160709] device-mapper: dust: block 60 removed from badblocklist by write
[17025.069738] [ T160709] device-mapper: dust: block 67 removed from badblocklist by write
[17025.071753] [ T160709] device-mapper: dust: block 72 removed from badblocklist by write


[3]

diff --git a/tests/dm/002 b/tests/dm/002
index 6635c43..f60e3bf 100755
--- a/tests/dm/002
+++ b/tests/dm/002
@@ -14,10 +14,12 @@ requires() {
 
 
 test_device() {
+	local sz bsz
 	echo "Running ${TEST_NAME}"
 
-	TEST_DEV_SZ=$(blockdev --getsz "$TEST_DEV")
-	dmsetup create dust1 --table "0 $TEST_DEV_SZ dust $TEST_DEV 0 512"
+	sz=$(blockdev --getsz "$TEST_DEV")
+	bsz=$(blockdev --getbsz "$TEST_DEV")
+	dmsetup create dust1 --table "0 $sz dust $TEST_DEV 0 $bsz"
 	dmsetup message dust1 0 addbadblock 60
 	dmsetup message dust1 0 addbadblock 67
 	dmsetup message dust1 0 addbadblock 72
@@ -30,7 +32,7 @@ test_device() {
 	dmsetup message dust1 0 addbadblock 72
 	dmsetup message dust1 0 countbadblocks
 	dmsetup message dust1 0 enable
-	dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct >/dev/null 2>&1 || return $?
+	dd if=/dev/zero of=/dev/mapper/dust1 bs=$bsz count=128 oflag=direct >$FULL 2>&1 || return $?
 	sync
 	dmsetup message dust1 0 countbadblocks
 	sync