Re: [CEPH-DEVEL] [ceph-users] occasional failure to unmap rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 24, 2015 at 12:49 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Tue, Nov 24, 2015 at 12:12 AM, Markus Kienast <elias1884@xxxxxxxxx> wrote:
>> Kernel Version
>> elias@paris3:~$ uname -a
>> Linux paris3.sfe.tv 3.16.0-28-generic #38-Ubuntu SMP Sat Dec 13
>> 16:13:28 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Output of dmesg and /var/log/dmesg attached.
>> But does not show much except for one mon being down.
>> The mon is down for hardware reasons.
>>
>>
>>
>> On Mon, Nov 23, 2015 at 11:26 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>
>>> On Mon, Nov 23, 2015 at 11:03 PM, Markus Kienast <mark@xxxxxxxxxxxxx> wrote:
>>> > I am having the same issue here.
>>>
>>> Which kernel are you running?  Could you attach your dmesg?
>>>
>>> >
>>> > root@paris3:/etc/neutron# rbd unmap /dev/rbd0
>>> > rbd: failed to remove rbd device: (16) Device or resource busy
>>> > rbd: remove failed: (16) Device or resource busy
>>> >
>>> > root@paris3:/etc/neutron# rbd info -p volumes
>>> > volume-f3ab6892-f35e-4b98-8832-efbaaa2f4ca2
>>> > 2015-11-23 22:42:06.842697 7f2d57e49700  0 -- :/2760503703 >>
>>> > 10.90.90.4:6789/0 pipe(0x1773250 sd=3 :0 s=1 pgs=0 cs=0 l=1
>>> > c=0x17734e0).fault
>>> > rbd image 'volume-f3ab6892-f35e-4b98-8832-efbaaa2f4ca2':
>>> > size 500 GB in 128000 objects
>>> > order 22 (4096 kB objects)
>>> > block_name_prefix: rbd_data.1b6d9e2aaa998b
>>> > format: 2
>>> > features: layering
>>> > root@paris3:/etc/neutron# rados -p volumes listwatchers
>>> > rbd_header.1b6d9e2aaa998b
>>> > 2015-11-23 22:42:58.546723 7fec94fec700  0 -- :/2519796249 >>
>>> > 10.90.90.4:6789/0 pipe(0x9cf260 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x9cf4f0).fault
>>>
>>> Did you root cause these faults?
>>
>> Hardware failure caused these faults.
>>
>>>
>>> > watcher=10.90.90.3:0/3293327848 client.8471177 cookie=1
>>> >
>>> > root@paris3:/etc/neutron# ps ax | grep rbd
>>> >  7814 ?        S      0:00 [jbd2/rbd0-8]
>>>
>>> Was there an ext filesystem involved?  How was it umounted - do you
>>> have a "umount <mountpoint>" process stuck in D state?
>>
>> Yes, all these RBDs are formatted with ext4. I am regularly using them
>> with openstack and have never had any problems.
>> I did "unmount <mountpoint>" and the unmount process did actually
>> finish just fine.
>> Where can I look up, if it is stuck in "D" state?
>>
>>>
>>> > 11003 ?        S      0:00 [jbd2/rbd1-8]
>>> > 14042 ?        S      0:00 [jbd2/rbd2p1-8]
>>> > 24228 ?        S      0:00 [jbd2/rbd3-8]
>>> >
>>> > root@paris3:/etc/neutron# ceph --version
>>> > ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070)
>>> >
>>> > root@paris3:/etc/neutron# ls /sys/block/rbd0/holders/
>>> > returns nothing
>>> >
>>> > root@paris3:/etc/neutron# fuser -amv /dev/rbd0
>>> >                      USER        PID ACCESS COMMAND
>>> > /dev/rbd0:
>>>
>>> What's the output of "cat /sys/bus/rbd/devices/0/client_id"?
>>
>> root@paris3:~# cat /sys/bus/rbd/devices/0/client_id
>> client8471177
>>
>>>
>>> What's the output of "sudo cat /sys/kernel/debug/ceph/*/osdc"?
>>
>> root@paris3:~# ls -l /sys/kernel/debug/ceph/
>> total 0
>> drwxr-xr-x 2 root root 0 Feb  4  2015
>> 32ba3117-e320-49fc-aabd-f100d5a7e94b.client7663711
>> drwxr-xr-x 2 root root 0 Nov 23 11:41
>> 32ba3117-e320-49fc-aabd-f100d5a7e94b.client8471177
>>
>> root@paris3:~# cat
>> /sys/kernel/debug/ceph/32ba3117-e320-49fc-aabd-f100d5a7e94b.client8471177/osdc
>> has no output
>
> This means there are no outstanding/hung rbd I/Os.  According to you,
> umount completed successfully, and yet there is a jbd2/rbd0-8 kthread
> hanging around, keeping /dev/rbd0 open and holding a ref to it.
> A quick search produced two similar reports:
>
> [1] https://ask.fedoraproject.org/en/question/7572/how-to-stop-kernel-ext4-journaling-thread/
> [2] http://lists.openwall.net/linux-ext4/2015/10/24/11
>
> The only difference as far I can tell is those people noticed the jbd2
> thread because they wanted to run fsck, while you ran into it because
> you tried to do "rbd unmap".  Neither mentions rbd.
>
> Look at [2], did you at any point see any similar errors in dmesg?
>
>>
>> root@paris3:~# cat
>> /sys/kernel/debug/ceph/32ba3117-e320-49fc-aabd-f100d5a7e94b.client7663711/osdc
>> hangs with no output
>
> It shouldn't hang, so it could be unrelated.  Given the "Feb  4  2015"

It should read "so it could be related", of course.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux