On Wed, Jul 15, 2015 at 1:11 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > I spent a bunch of today looking at http://tracker.ceph.com/issues/12297. > > Long story short: the workload is doing a readdir at the same time as > it's unlinking files. The readdir functions (in this case, > _readdir_cache_cb) drop the client_lock each time they invoke the > callback (for obvious reasons). There is some effort in > _readdir_cache_cb to try and keep the iterator valid (we check on each > loop that we aren't at end; we increment the iterator before dropping > the lock), but it's not sufficient. > > Is there supposed to be something preventing this kind of race? If not > I can work something out in the code but I've not done much work in > that bit and there are enough pieces that I wonder if I'm missing some > other issue. I think calling (*pd)->get() before release the client_lock should work. Regards Yan, Zheng > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html