> Are you still running kernel rbd as a client of ceph > services running on the same physical machine? > > I personally believe that scenario may be at risk of > deadlock in any case--we haven't taken great care to > avoid it in this case. Yes. Risking a deadlock on this machine is fine though, we only use it for development and testing. > Anyway... > > I can build v3.14.1 but I don't know what kernel configuration > you are using. Knowing that could be helpful. I built it using > a config I have though, and it's *possible* you crashed on > this line, in rbd_segment_name(): > ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format, > rbd_dev->header.object_prefix, segment); > And if so, the only reason I can think that this failed is if > rbd_dev->header.object_prefix were null (or an otherwise bad > pointer value). But at this point it's a lot of speculation. config: http://pastebin.com/unZCzXZZ > Depending on what your stress tests were doing, I suppose it > could be that you unmapped an in-use rbd image and there was > some sort of insufficient locking. > > Can you also give a little insight about what your stress > tests were doing? The stress testing had about 3 rbd volumes constantly mapped. A standard webstack was installed on them (LNMP) with a wordpress installation which was hammered with requests to PHP which made further calls to mysql. All volumes used ext4 and one of them hosted the raw mysql innodb data files. From the stack trace it looks like mysqld did an fsync which cased the failure in rbd. The server was otherwise completely unused, no concurrent rbd mapping took place. rbd was using layered mode but there should be a maximum of about 3 layers. Thank you for your time, Hannes -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html