On Thu, Mar 27, 2014 at 9:48 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote: > Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit : >> Olivier reports that with the simple patch I provided >> (which changed a "<" to a "!=" and removed an assertion) >> he is running successfully. >> >> To me this is fantastic news, and you can see I posted >> a patch with the fix. >> >> There remains a race condition though, one which I described >> in a separate message earlier today. I don't think it will >> prove to be a problem in practice, but I agreed to work on >> a fix to ensure the race condition is eliminated. It will >> require some work with reference counting image and object >> requests. >> >> The fix won't be coming today. But I aim to provide it >> in a matter of several days. >> >> -Alex >> > > One question from one of my customers : why am I the only one to > complain about that problem ? > I know that Ceph users often use qemu/librbd instead of kernel client, > but what is the trigger of those «race condition» ? Having "multiple > requests" per RBD image ? It should be a normal use, no ? > > If someone can help me give an explanation, thanks :) We've had a couple more, similar reports in the last few months. However you are the first reporter who was able to trigger this race often enough to track it down. This race condition (read: bug) is kernel client specific, qemu/librbd is unaffected. Having an rbd request that spans multiple RADOS objects and therefore results in multiple object requests is normal use, it's just that particular piece of code turned out to be prone to a subtle race. You have to keep in mind that races are all about timing and relative order of events, so simply issuing a multi-object rbd request is not enough to trigger it, stars have to align too ;) Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html