Hi Ming, On 01/04/2016 05:23 PM, Ming Lei wrote: > On Mon, Jan 4, 2016 at 4:21 PM, Nikolay Borisov > <n.borisov@xxxxxxxxxxxxxx> wrote: >> Hello block people , >> >> I'm running some experiments using the attached init_vg.txt script. And >> at the same time I have the following systemtap script active: >> >> probe kernel.statement("loop_clr_fd@drivers/block/loop.c:896") { >> printf("Unbound device %s\n", kernel_string($lo->lo_disk->disk_name)); >> } >> >> >> probe kernel.statement("loop_set_fd@drivers/block/loop.c:780") { >> printf("Bound device: %s\n", kernel_string($lo->lo_disk->disk_name)); >> //print_backtrace(); >> } >> >> probe kernel.statement("__blk_mq_run_hw_queue@block/blk-mq.c:814") { >> printf("error in blk_mq_run_hq_queue for dev %s\n", kernel_string($bd->rq->rq_disk->disk_name)); >> print_backtrace(); >> print("----------------------------------\n"); >> } >> >> Which produces the following output from time to time: >> >> Unbound device loop3 >> error in blk_mq_run_hq_queue for dev loop3 >> 0xffffffff8134ef6b : __blk_mq_run_hw_queue+0x29b/0x380 [kernel] >> 0xffffffff8134f10a : blk_mq_run_hw_queue+0x6a/0x80 [kernel] >> 0xffffffff8134faeb : blk_mq_insert_requests+0xdb/0x120 [kernel] >> 0xffffffff8134fc54 : blk_mq_flush_plug_list+0x124/0x140 [kernel] >> 0xffffffff81346886 : blk_flush_plug_list+0xc6/0x1f0 [kernel] >> 0xffffffff813469e4 : blk_finish_plug+0x34/0x50 [kernel] >> 0xffffffff811de687 : do_blockdev_direct_IO+0x757/0xbf0 [kernel] >> 0xffffffff811deb63 : __blockdev_direct_IO+0x43/0x50 [kernel] >> 0xffffffff811da8b8 : blkdev_direct_IO+0x58/0x80 [kernel] >> 0xffffffff8112b73f : generic_file_read_iter+0x13f/0x150 [kernel] >> 0xffffffff811d9fd7 : blkdev_read_iter+0x37/0x40 [kernel] >> 0xffffffff811a1d13 : __vfs_read+0xd3/0xf0 [kernel] >> 0xffffffff811a1ea7 : vfs_read+0x97/0xe0 [kernel] >> 0xffffffff811a287a : sys_read+0x5a/0xc0 [kernel] >> 0xffffffff8162102e : entry_SYSCALL_64_fastpath+0x12/0x71 [kernel] >> ---------------------------------- >> Bound device: loop3 >> >> At the same time I get the following output in dmesg: >> blk-mq: bad return on queue: -5 <-- This -EIO code is returned from loop_queue_rq >> blk_update_request: I/O error, dev loop3, sector 0 >> >> To me this means it's possible that device disabling races with >> pending IO plugs for this device. I wonder whether it would be possible >> to flush any plugs for a particular device before disabling its >> multiqueue? Or maybe delay the plug flushing until we know the device > > Yes, you should deattach the loop block after all pending I/Os to current loop > block are completed first. For example, umount and lvremove should be run > before deleting loop in your test case, and the paths are totally controlled > by user space. > >> is actually active. Though I can see a problem with the latter approach >> since this would mean it's possible to have the following scenario: >> >> 1. Device is attached to system and writes are going normally >> 2. A process plugs the device and starts queuing IO on the plug >> 3. The device is detached from the system >> 4. Plug flushing code detects (3) and waits until device is re-attached >> 5. Device is reattached >> 6. Plug from (4) is flushed. >> >> However, the device attached in (5) might not be the same device as in >> (1) and this would mean that (6) would be writing potentially random >> data WRT device attached to (5) . > > It is the user's responsiblity to complete all pending I/O to current loop(old) > before the loop(new) is attached again because both the two pathes are > from user-space finally. And these I/Os will be completed as -EIO and > won't reach the backing file at all, so how can the above case happen? It can't happen, I was just thinking out loud. As I have pointed out - this seems a rather bogus scenario. > >> >> Essentially is it normal to have IO fail in such situations? > > cat init_vg.txt > ... > loopdev=$(losetup -f --show ${file}) > pvcreate --metadatasize 1M ${loopdev} > vgcreate ${group} -s 1MiB ${loopdev} > ... > umount $mntpath > vgchange -Kan $group > losetup -d $loopdev > > As far as for your above test case, it is normal to fail the IO after > the loop block is deleted, and you should have removed the volume > group first before deleting the loop block. But in this case the filesystem (which is on the volume group, which is on the loop device) is unmounted, then the volume group is deactivated, which, at this point, should stop all IO and finally the loop device is nuked, yet I can still see IO in transmit. Based on this it seems that vgchange might not be flushing everything. I mostly see the failures occur with reads. > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html