Re: deadlocks in rbd unmap and map

Victor Payno <vpayno@xxxxxxxxxx> · Mon, 11 Jul 2016 12:12:38 -0700

On the ceph clients we haven't seen any ceph/rbd related dmesg other than these.

[28623.056936] libceph: mon0 10.9.228.50:6789 session established
[28623.060888] libceph: client773380 fsid e88d2684-47c1-5a64-a275-6e375d11b557
[28623.131952] rbd: rbd0: added with size 0x2540be400
[28624.040816] EXT4-fs (rbd0): mounted filesystem without journal. Opts: (null)
[26123.339863] libceph: osd1 10.9.228.101:6804 socket closed (con state OPEN)

We can enable "debug ms=1" on the OSD servers.

We'll get you the output next time we are able to reproduce it (this
weekend's loadtest failed to reproduce the error with kernel
4.7.0-rc6-vanilla-ams-3-00134-gcfae7e3-dirty).

It doesn't happen when we have a single instance of map/unmap. It
seems to happen 1 or more hours into the load test.

On Sat, Jul 9, 2016 at 7:30 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Fri, Jul 8, 2016 at 10:30 PM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
>> On Fri, Jul 8, 2016 at 11:46 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>> On Fri, Jul 8, 2016 at 7:51 PM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
>>>> On Fri, Jul 8, 2016 at 4:40 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>>> On Fri, Jul 8, 2016 at 3:28 AM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
>>>>>> This is on linus git master as of 2016/07/01
>>>>>>
>>>>>> These appear to be two separate deadlocks, one on a a map operation,
>>>>>> and one on an unmap operation. We can reproduce these pretty
>>>>>> regularly, but it seems like there is some sort of race condition, as
>>>>>> it happens no where near every time.
>>>>>>
>>>>
>>>>>
>>>>> It's a single deadlock between "rbd map" and a kworker thread, a later
>>>>> "rbd unmap" is just a victim.
>>>>>
>>>>> Are you mapping the same image more than once?
>>>>>
>>>>
>>>> We shouldn't be, there is a higher-level locking system that is
>>>> supposed to prevent that.
>>>
>>> It's actually allowed, I'm just trying to get an idea of what was going
>>> on.
>>>
>>> I spoke too soon.  The trace of pid 14109 is inconsistent - the entries
>>> in the middle don't make sense.  Do you have a different set of traces
>>> or are they all the same?
>>
>> Here is another backtrace. It seems that once it happens to a
>> particular image once, it always does the same thing. We have not
>> figured out what triggers the first hang.
>>
>> # grep -C 40 rbd /proc/*/stack
>> /proc/5295/stack-[<ffffffff8111c7c4>] __queue_work+0x144/0x420
>> /proc/5295/stack-[<ffffffff8112c1c0>] default_wake_function+0x0/0x10
>> /proc/5295/stack:[<ffffffffa01621ee>]
>> rbd_obj_method_sync.constprop.48+0x1be/0x290 [rbd]
>> /proc/5295/stack:[<ffffffffa0162aea>] rbd_dev_header_info+0x15a/0x930 [rbd]
>> /proc/5295/stack:[<ffffffffa0163450>] rbd_watch_cb+0x0/0xa0 [rbd]
>> /proc/5295/stack:[<ffffffffa0163586>]
>> rbd_dev_image_probe.part.42+0x96/0x910 [rbd]
>> /proc/5295/stack:[<ffffffffa016240e>] rbd_dev_image_id+0x14e/0x1b0 [rbd]
>> /proc/5295/stack:[<ffffffffa016428f>] do_rbd_add.isra.43+0x48f/0xbb0 [rbd]
>> /proc/5295/stack-[<ffffffff8123ba27>] __kmalloc+0x27/0x170
>> /proc/5295/stack-[<ffffffff812b7f3a>] kernfs_fop_write+0x10a/0x190
>> /proc/5295/stack-[<ffffffff8124dc63>] __vfs_write+0x23/0x120
>> /proc/5295/stack-[<ffffffff8124e8f3>] vfs_write+0xb3/0x1a0
>> /proc/5295/stack-[<ffffffff8124fbd2>] SyS_write+0x42/0xa0
>> /proc/5295/stack-[<ffffffff81816f72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
>> /proc/5295/stack-[<ffffffffffffffff>] 0xffffffffffffffff
>>
>> # ps aux | grep 5295
>> root      5295  0.0  0.0 246704 10352 pts/4    Sl+  20:20   0:00 rbd
>> map --pool XXXXXXX XXXXXXXX
>
> Anything in dmesg?  It looks like rbd map is waiting for a reply to one
> of the header read-in requests and is never woken up.  rbd map holds
> a semaphore, blocking the kworker, which blocks rbd unmap.
>
> Can you set "debug ms = 1" on all OSDs and try to reproduce?
> I'd also like to see the content of /sys/kernel/debug/ceph/*/osdc when
> it happens again.
>
> How long does it take to hit it on average?
>
> Thanks,
>
>                 Ilya

-- 
Victor Payno
ビクター·ペイン

Sr. Release Engineer
シニアリリースエンジニア

Gaikai, a Sony Computer Entertainment Company   ∆○×□
ガイカイ、ソニー・コンピュータエンタテインメント傘下会社
65 Enterprise
Aliso Viejo, CA 92656 USA

Web: www.gaikai.com
Email: vpayno@xxxxxxxxxx
Phone: (949) 330-6850
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html