Re: Disappearing device during device plugging causes io errors.

Ming Lei <tom.leiming@xxxxxxxxx> · Mon, 4 Jan 2016 23:44:27 +0800

On Mon, Jan 4, 2016 at 11:31 PM, Nikolay Borisov
<n.borisov@xxxxxxxxxxxxxx> wrote:
> Hi Ming,
>
> On 01/04/2016 05:23 PM, Ming Lei wrote:
>> On Mon, Jan 4, 2016 at 4:21 PM, Nikolay Borisov
>> <n.borisov@xxxxxxxxxxxxxx> wrote:
>>> Hello block people ,
>>>
>>> I'm running some experiments using the attached init_vg.txt script. And
>>> at the same time I have the following systemtap script active:
>>>
>>> probe kernel.statement("loop_clr_fd@drivers/block/loop.c:896") {
>>>         printf("Unbound device %s\n", kernel_string($lo->lo_disk->disk_name));
>>> }
>>>
>>>
>>> probe kernel.statement("loop_set_fd@drivers/block/loop.c:780") {
>>>         printf("Bound device: %s\n", kernel_string($lo->lo_disk->disk_name));
>>>         //print_backtrace();
>>> }
>>>
>>> probe kernel.statement("__blk_mq_run_hw_queue@block/blk-mq.c:814") {
>>>         printf("error in blk_mq_run_hq_queue for dev %s\n", kernel_string($bd->rq->rq_disk->disk_name));
>>>         print_backtrace();
>>>         print("----------------------------------\n");
>>> }
>>>
>>> Which produces the following output from time to time:
>>>
>>> Unbound device loop3
>>> error in blk_mq_run_hq_queue for dev loop3
>>>  0xffffffff8134ef6b : __blk_mq_run_hw_queue+0x29b/0x380 [kernel]
>>>  0xffffffff8134f10a : blk_mq_run_hw_queue+0x6a/0x80 [kernel]
>>>  0xffffffff8134faeb : blk_mq_insert_requests+0xdb/0x120 [kernel]
>>>  0xffffffff8134fc54 : blk_mq_flush_plug_list+0x124/0x140 [kernel]
>>>  0xffffffff81346886 : blk_flush_plug_list+0xc6/0x1f0 [kernel]
>>>  0xffffffff813469e4 : blk_finish_plug+0x34/0x50 [kernel]
>>>  0xffffffff811de687 : do_blockdev_direct_IO+0x757/0xbf0 [kernel]
>>>  0xffffffff811deb63 : __blockdev_direct_IO+0x43/0x50 [kernel]
>>>  0xffffffff811da8b8 : blkdev_direct_IO+0x58/0x80 [kernel]
>>>  0xffffffff8112b73f : generic_file_read_iter+0x13f/0x150 [kernel]
>>>  0xffffffff811d9fd7 : blkdev_read_iter+0x37/0x40 [kernel]
>>>  0xffffffff811a1d13 : __vfs_read+0xd3/0xf0 [kernel]
>>>  0xffffffff811a1ea7 : vfs_read+0x97/0xe0 [kernel]
>>>  0xffffffff811a287a : sys_read+0x5a/0xc0 [kernel]
>>>  0xffffffff8162102e : entry_SYSCALL_64_fastpath+0x12/0x71 [kernel]
>>> ----------------------------------
>>> Bound device: loop3
>>>
>>> At the same time I get the following output in dmesg:
>>> blk-mq: bad return on queue: -5 <-- This -EIO code is returned from loop_queue_rq
>>> blk_update_request: I/O error, dev loop3, sector 0
>>>
>>> To me this means it's possible that device disabling races with
>>> pending IO plugs for this device. I wonder whether it would be possible
>>> to flush any plugs for a particular device before disabling its
>>> multiqueue? Or maybe delay the plug flushing until we know the device
>>
>> Yes, you should deattach the loop block after all pending I/Os to current loop
>> block are completed first. For example, umount and lvremove should be run
>> before deleting loop in your test case, and the paths are totally controlled
>> by user space.
>>
>>> is actually active. Though I can see a problem with the latter approach
>>> since this would mean it's possible to have the following scenario:
>>>
>>> 1. Device is attached to system and writes are going normally
>>> 2. A process plugs the device and starts queuing IO on the plug
>>> 3. The device is detached from the system
>>> 4. Plug flushing code detects (3) and waits until device is re-attached
>>> 5. Device is reattached
>>> 6. Plug from (4) is flushed.
>>>
>>> However, the device attached in (5) might not be the same device as in
>>> (1) and this would mean that (6) would be writing potentially random
>>> data WRT device attached to (5) .
>>
>> It is the user's responsiblity to complete all pending I/O to current loop(old)
>> before the loop(new) is attached again because both the two pathes are
>> from user-space finally.  And these I/Os will be completed as -EIO and
>> won't reach the backing file at all, so how can the above case happen?
>
> It can't happen, I was just thinking out loud. As I have pointed out -
> this seems a rather bogus scenario.

OK, so there isn't real problem in your report.

>
>>
>>>
>>> Essentially is it normal to have IO fail in such situations?
>>
>>     cat init_vg.txt
>>     ...
>>     loopdev=$(losetup -f --show ${file})
>>     pvcreate --metadatasize 1M ${loopdev}
>>     vgcreate ${group} -s 1MiB ${loopdev}
>>     ...
>>     umount $mntpath
>>     vgchange -Kan $group
>>     losetup -d $loopdev
>>
>> As far as for your above test case, it is normal to fail the IO after
>> the loop block is deleted, and you should have removed the volume
>> group first before deleting the loop block.
>
> But in this case the filesystem (which is on the volume group, which is
> on the loop device) is unmounted, then the volume group is deactivated,

As I mentioned, you should have run lvremove before attaching/disabling
the loop.

> which, at this point, should stop all IO and finally the loop device is
> nuked, yet I can still see IO in transmit. Based on this it seems that
> vgchange might not be flushing everything. I mostly see the failures
> occur with reads.

The read may be from reading partition table, and loop block just
returns -EIO in this situation, so what is wrong with this way?

--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html