Hello block people , I'm running some experiments using the attached init_vg.txt script. And at the same time I have the following systemtap script active: probe kernel.statement("loop_clr_fd@drivers/block/loop.c:896") { printf("Unbound device %s\n", kernel_string($lo->lo_disk->disk_name)); } probe kernel.statement("loop_set_fd@drivers/block/loop.c:780") { printf("Bound device: %s\n", kernel_string($lo->lo_disk->disk_name)); //print_backtrace(); } probe kernel.statement("__blk_mq_run_hw_queue@block/blk-mq.c:814") { printf("error in blk_mq_run_hq_queue for dev %s\n", kernel_string($bd->rq->rq_disk->disk_name)); print_backtrace(); print("----------------------------------\n"); } Which produces the following output from time to time: Unbound device loop3 error in blk_mq_run_hq_queue for dev loop3 0xffffffff8134ef6b : __blk_mq_run_hw_queue+0x29b/0x380 [kernel] 0xffffffff8134f10a : blk_mq_run_hw_queue+0x6a/0x80 [kernel] 0xffffffff8134faeb : blk_mq_insert_requests+0xdb/0x120 [kernel] 0xffffffff8134fc54 : blk_mq_flush_plug_list+0x124/0x140 [kernel] 0xffffffff81346886 : blk_flush_plug_list+0xc6/0x1f0 [kernel] 0xffffffff813469e4 : blk_finish_plug+0x34/0x50 [kernel] 0xffffffff811de687 : do_blockdev_direct_IO+0x757/0xbf0 [kernel] 0xffffffff811deb63 : __blockdev_direct_IO+0x43/0x50 [kernel] 0xffffffff811da8b8 : blkdev_direct_IO+0x58/0x80 [kernel] 0xffffffff8112b73f : generic_file_read_iter+0x13f/0x150 [kernel] 0xffffffff811d9fd7 : blkdev_read_iter+0x37/0x40 [kernel] 0xffffffff811a1d13 : __vfs_read+0xd3/0xf0 [kernel] 0xffffffff811a1ea7 : vfs_read+0x97/0xe0 [kernel] 0xffffffff811a287a : sys_read+0x5a/0xc0 [kernel] 0xffffffff8162102e : entry_SYSCALL_64_fastpath+0x12/0x71 [kernel] ---------------------------------- Bound device: loop3 At the same time I get the following output in dmesg: blk-mq: bad return on queue: -5 <-- This -EIO code is returned from loop_queue_rq blk_update_request: I/O error, dev loop3, sector 0 To me this means it's possible that device disabling races with pending IO plugs for this device. I wonder whether it would be possible to flush any plugs for a particular device before disabling its multiqueue? Or maybe delay the plug flushing until we know the device is actually active. Though I can see a problem with the latter approach since this would mean it's possible to have the following scenario: 1. Device is attached to system and writes are going normally 2. A process plugs the device and starts queuing IO on the plug 3. The device is detached from the system 4. Plug flushing code detects (3) and waits until device is re-attached 5. Device is reattached 6. Plug from (4) is flushed. However, the device attached in (5) might not be the same device as in (1) and this would mean that (6) would be writing potentially random data WRT device attached to (5) . Essentially is it normal to have IO fail in such situations?
#!/bin/bash function get_random() { local number=0 while [ "$number" -le 20 ] do number=$RANDOM let "number %= 50" done echo $number } file=$(mktemp -u --tmpdir=. vgfile.XXXX) group=$(mktemp -u testgrp-XXXX) thingroup=$(mktemp -u thingrp-XXXX) mntpath=$(mktemp -d --tmpdir=. mntdir-XXXX) volume_name=$(mktemp -u testvol-XXXX) volume_size=200M truncate ${file} --size 10G loopdev=$(losetup -f --show ${file}) pvcreate --metadatasize 1M ${loopdev} vgcreate ${group} -s 1MiB ${loopdev} pe_size=$(vgdisplay "/dev/${group}" | grep 'PE Size' | awk '{print $3}') thin_size=$(echo "$(vgdisplay "/dev/${group}" | grep 'Free PE' | awk '{print $5}')*${pe_size}-180" | bc -l) lvcreate --ignoreactivationskip -Z n -L ${thin_size}M -T "/dev/${group}/${thingroup}" lvcreate --ignoreactivationskip -V${volume_size} -T "${group}/${thingroup}" -n "${volume_name}" mkfs.ext4 /dev/$group/$volume_name sync vgchange -Kan $group losetup -d $loopdev echo "Volume created, doing work" for i in {1..10}; do echo "Doing iteration $i" loopdev=$(losetup -f --show ${file}) vgchange -Kay $group if ! mount /dev/$group/$volume_name $mntpath; then echo "kor" exit 1 fi rm -rf $mntpath/* dd if=/dev/urandom of=$mntpath/$(mktemp -u tmpfile.XXXX) bs=$(get_random)M count=1 umount $mntpath vgchange -Kan $group losetup -d $loopdev done