On Tue, 14 Jul 2015, Gregory Farnum wrote: > I spent a bunch of today looking at http://tracker.ceph.com/issues/12297. > > Long story short: the workload is doing a readdir at the same time as > it's unlinking files. The readdir functions (in this case, > _readdir_cache_cb) drop the client_lock each time they invoke the > callback (for obvious reasons). There is some effort in > _readdir_cache_cb to try and keep the iterator valid (we check on each > loop that we aren't at end; we increment the iterator before dropping > the lock), but it's not sufficient. > > Is there supposed to be something preventing this kind of race? If not > I can work something out in the code but I've not done much work in > that bit and there are enough pieces that I wonder if I'm missing some > other issue. What is the race you're worried about? Unlinking the file that we're doing the callback on, or the one that follows it (where the iterator now points)? My guess is that in this case unlink should see that there is a reference on the dentry and should make it NULL instead of unlinking it from the directory entirely... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html