> > On Fri, 30 Aug 2013, James Harper wrote: > > I finally got a valgrind memtest hit... output attached below email. I > > recompiled all of tapdisk and ceph without any -O options (thought I had > > already...) and it seems to have done the trick > > What version is this? The line numbers don't seem to match up with my > source tree. 0.67.2, but I've peppered it with debug prints > > Basically it looks like an instance of AioRead is being accessed after > > being free'd. I need some hints on what api behaviour by the tapdisk > > driver could be causing this to happen in librbd... > > It looks like refcounting for the AioCompletion is off. My first guess > would be premature (or extra) calls to rados_aio_release or > AioCompletion::release(). > > I did a quick look at the code and it looks like aio_read() is carrying a > ref for the AioComplete for the entire duration of the function, so it > should not be disappearing (and taking the AioRead request struct with it) > until well after where the invalid read is. Maybe there is an error path > somewhere what is dropping a ref it shouldn't? > I'll see if I can find a way to track that. It's the c->get() and c->put() that track this right? The crash seems a little bit different every time, so it could still be something stomping on memory, eg overwriting the ref count or something. Thanks James -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html