On Fri, Jul 8, 2016 at 7:51 PM, Patrick McLean <patrickm@xxxxxxxxxx> wrote: > On Fri, Jul 8, 2016 at 4:40 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> On Fri, Jul 8, 2016 at 3:28 AM, Patrick McLean <patrickm@xxxxxxxxxx> wrote: >>> This is on linus git master as of 2016/07/01 >>> >>> These appear to be two separate deadlocks, one on a a map operation, >>> and one on an unmap operation. We can reproduce these pretty >>> regularly, but it seems like there is some sort of race condition, as >>> it happens no where near every time. >>> > >> >> It's a single deadlock between "rbd map" and a kworker thread, a later >> "rbd unmap" is just a victim. >> >> Are you mapping the same image more than once? >> > > We shouldn't be, there is a higher-level locking system that is > supposed to prevent that. It's actually allowed, I'm just trying to get an idea of what was going on. I spoke too soon. The trace of pid 14109 is inconsistent - the entries in the middle don't make sense. Do you have a different set of traces or are they all the same? Were there other images mapped at the time 14109 was exec'ed? Other concurrent "rbd map" processes? What was/is the state of the cluster? Can you provide the output of ceph -s? Any stuck PGs? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html