Re: Disappearing device during device plugging causes io errors.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 01/05/2016 03:34 AM, Ming Lei wrote:
> On Mon, Jan 4, 2016 at 11:56 PM, Nikolay Borisov
> <n.borisov@xxxxxxxxxxxxxx> wrote:
>>
>>
>> On 01/04/2016 05:44 PM, Ming Lei wrote:
>>> On Mon, Jan 4, 2016 at 11:31 PM, Nikolay Borisov
>>> <n.borisov@xxxxxxxxxxxxxx> wrote:
>>>> Hi Ming,
>>>>
>>>> On 01/04/2016 05:23 PM, Ming Lei wrote:
>>>>> On Mon, Jan 4, 2016 at 4:21 PM, Nikolay Borisov
>>>>> <n.borisov@xxxxxxxxxxxxxx> wrote:
>>>>>> Hello block people ,
>>>>>>
>>>>>> I'm running some experiments using the attached init_vg.txt script. And
>>>>>> at the same time I have the following systemtap script active:
>>>>>>
>>>>>> probe kernel.statement("loop_clr_fd@drivers/block/loop.c:896") {
>>>>>>         printf("Unbound device %s\n", kernel_string($lo->lo_disk->disk_name));
>>>>>> }
>>>>>>
>>>>>>
>>>>>> probe kernel.statement("loop_set_fd@drivers/block/loop.c:780") {
>>>>>>         printf("Bound device: %s\n", kernel_string($lo->lo_disk->disk_name));
>>>>>>         //print_backtrace();
>>>>>> }
>>>>>>
>>>>>> probe kernel.statement("__blk_mq_run_hw_queue@block/blk-mq.c:814") {
>>>>>>         printf("error in blk_mq_run_hq_queue for dev %s\n", kernel_string($bd->rq->rq_disk->disk_name));
>>>>>>         print_backtrace();
>>>>>>         print("----------------------------------\n");
>>>>>> }
>>>>>>
>>>>>> Which produces the following output from time to time:
>>>>>>
>>>>>> Unbound device loop3
>>>>>> error in blk_mq_run_hq_queue for dev loop3
>>>>>>  0xffffffff8134ef6b : __blk_mq_run_hw_queue+0x29b/0x380 [kernel]
>>>>>>  0xffffffff8134f10a : blk_mq_run_hw_queue+0x6a/0x80 [kernel]
>>>>>>  0xffffffff8134faeb : blk_mq_insert_requests+0xdb/0x120 [kernel]
>>>>>>  0xffffffff8134fc54 : blk_mq_flush_plug_list+0x124/0x140 [kernel]
>>>>>>  0xffffffff81346886 : blk_flush_plug_list+0xc6/0x1f0 [kernel]
>>>>>>  0xffffffff813469e4 : blk_finish_plug+0x34/0x50 [kernel]
>>>>>>  0xffffffff811de687 : do_blockdev_direct_IO+0x757/0xbf0 [kernel]
>>>>>>  0xffffffff811deb63 : __blockdev_direct_IO+0x43/0x50 [kernel]
>>>>>>  0xffffffff811da8b8 : blkdev_direct_IO+0x58/0x80 [kernel]
>>>>>>  0xffffffff8112b73f : generic_file_read_iter+0x13f/0x150 [kernel]
>>>>>>  0xffffffff811d9fd7 : blkdev_read_iter+0x37/0x40 [kernel]
>>>>>>  0xffffffff811a1d13 : __vfs_read+0xd3/0xf0 [kernel]
>>>>>>  0xffffffff811a1ea7 : vfs_read+0x97/0xe0 [kernel]
>>>>>>  0xffffffff811a287a : sys_read+0x5a/0xc0 [kernel]
>>>>>>  0xffffffff8162102e : entry_SYSCALL_64_fastpath+0x12/0x71 [kernel]
>>>>>> ----------------------------------
>>>>>> Bound device: loop3
>>>>>>
>>>>>> At the same time I get the following output in dmesg:
>>>>>> blk-mq: bad return on queue: -5 <-- This -EIO code is returned from loop_queue_rq
>>>>>> blk_update_request: I/O error, dev loop3, sector 0
>>>>>>
>>>>>> To me this means it's possible that device disabling races with
>>>>>> pending IO plugs for this device. I wonder whether it would be possible
>>>>>> to flush any plugs for a particular device before disabling its
>>>>>> multiqueue? Or maybe delay the plug flushing until we know the device
>>>>>
>>>>> Yes, you should deattach the loop block after all pending I/Os to current loop
>>>>> block are completed first. For example, umount and lvremove should be run
>>>>> before deleting loop in your test case, and the paths are totally controlled
>>>>> by user space.
>>>>>
>>>>>> is actually active. Though I can see a problem with the latter approach
>>>>>> since this would mean it's possible to have the following scenario:
>>>>>>
>>>>>> 1. Device is attached to system and writes are going normally
>>>>>> 2. A process plugs the device and starts queuing IO on the plug
>>>>>> 3. The device is detached from the system
>>>>>> 4. Plug flushing code detects (3) and waits until device is re-attached
>>>>>> 5. Device is reattached
>>>>>> 6. Plug from (4) is flushed.
>>>>>>
>>>>>> However, the device attached in (5) might not be the same device as in
>>>>>> (1) and this would mean that (6) would be writing potentially random
>>>>>> data WRT device attached to (5) .
>>>>>
>>>>> It is the user's responsiblity to complete all pending I/O to current loop(old)
>>>>> before the loop(new) is attached again because both the two pathes are
>>>>> from user-space finally.  And these I/Os will be completed as -EIO and
>>>>> won't reach the backing file at all, so how can the above case happen?
>>>>
>>>> It can't happen, I was just thinking out loud. As I have pointed out -
>>>> this seems a rather bogus scenario.
>>>
>>> OK, so there isn't real problem in your report.
>>
>> I just want to know (account) for all IO and just seeing some random IO
>> errors was putting me off.
> 
> No, it is definitely not random IO error, and all IO will be failed after
> the loop is detached.
> 
>>
>>>>>> Essentially is it normal to have IO fail in such situations?
>>>>>
>>>>>     cat init_vg.txt
>>>>>     ...
>>>>>     loopdev=$(losetup -f --show ${file})
>>>>>     pvcreate --metadatasize 1M ${loopdev}
>>>>>     vgcreate ${group} -s 1MiB ${loopdev}
>>>>>     ...
>>>>>     umount $mntpath
>>>>>     vgchange -Kan $group
>>>>>     losetup -d $loopdev
>>>>>
>>>>> As far as for your above test case, it is normal to fail the IO after
>>>>> the loop block is deleted, and you should have removed the volume
>>>>> group first before deleting the loop block.
>>>>
>>>> But in this case the filesystem (which is on the volume group, which is
>>>> on the loop device) is unmounted, then the volume group is deactivated,
>>>
>>> As I mentioned, you should have run lvremove before attaching/disabling
>>> the loop.
>>
>> But lvremove would delete my volumes, whereas I do not want to delete
>> them, rather just disable them (what lvchange -Kan is supposed to do)
> 
> OK, that looks fine.
> 
>> and then remove the loop device so that I can, for example, transfer the
>> VG by just moving the single loopback image. I will run more tests to
>> see from which process does the failure come.
>>
>>>
>>>> which, at this point, should stop all IO and finally the loop device is
>>>> nuked, yet I can still see IO in transmit. Based on this it seems that
>>>> vgchange might not be flushing everything. I mostly see the failures
>>>> occur with reads.
>>>
>>> The read may be from reading partition table, and loop block just
>>> returns -EIO in this situation, so what is wrong with this way?
>>
>> Will have to check this.

Modifying the stap script to show the process which was generating the
failure showed that it's mainly lvchange and sometimes (in the begining
of the test) the vgcreate command. This, coupled with the fact that the
failures happen during DIO and thus bypassing the filesystem could
really indicate that what you are saying (reading part table or
otherwise metadata from the volume) might be true. However, see my
concerns below.

> 
> OK.
> 
> I still can't see any problem from your report up to now.  If you think
> it is a real problem, please provide the observable effect from user view
> explicitly.

So when I run the test just once everything works as expected - all the
commands in the test are synchronous so it is not expected to have
lingering IO while the loop device is being removed, since this is done
after the filesystem is unmounted and lvchange has finished executing.
However, when I run multiple instances of the test case e.g.

for i in {1..6}; do ./init_vg.sh > /dev/null & done

where the number of instances is chosen such that it is equal to the
number of loopback device on system I start to see the aforementioned IO
failures. And they are always random wrt to when they are happening or
for which particular loopback device. And given the structure of the
test case - always generating unique names and each instance working
with its own dedicated loopback device I find it odd that I see the IO
failure with multiple tests and not when running 1 instance.

Regards,
Nikolay

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux