Re: deadlocks in rbd unmap and map

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 8 Jul 2016 20:46:16 +0200

On Fri, Jul 8, 2016 at 7:51 PM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
> On Fri, Jul 8, 2016 at 4:40 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>> On Fri, Jul 8, 2016 at 3:28 AM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
>>> This is on linus git master as of 2016/07/01
>>>
>>> These appear to be two separate deadlocks, one on a a map operation,
>>> and one on an unmap operation. We can reproduce these pretty
>>> regularly, but it seems like there is some sort of race condition, as
>>> it happens no where near every time.
>>>
>
>>
>> It's a single deadlock between "rbd map" and a kworker thread, a later
>> "rbd unmap" is just a victim.
>>
>> Are you mapping the same image more than once?
>>
>
> We shouldn't be, there is a higher-level locking system that is
> supposed to prevent that.

It's actually allowed, I'm just trying to get an idea of what was going
on.

I spoke too soon.  The trace of pid 14109 is inconsistent - the entries
in the middle don't make sense.  Do you have a different set of traces
or are they all the same?

Were there other images mapped at the time 14109 was exec'ed?  Other
concurrent "rbd map" processes?

What was/is the state of the cluster?  Can you provide the output of
ceph -s?  Any stuck PGs?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html