RE: debugging librbd async - valgrind memtest hit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> On Fri, 30 Aug 2013, James Harper wrote:
> > I finally got a valgrind memtest hit... output attached below email. I
> > recompiled all of tapdisk and ceph without any -O options (thought I had
> > already...) and it seems to have done the trick
> 
> What version is this?  The line numbers don't seem to match up with my
> source tree.

0.67.2, but I've peppered it with debug prints

> > Basically it looks like an instance of AioRead is being accessed after
> > being free'd. I need some hints on what api behaviour by the tapdisk
> > driver could be causing this to happen in librbd...
> 
> It looks like refcounting for the AioCompletion is off.  My first guess
> would be premature (or extra) calls to rados_aio_release or
> AioCompletion::release().
> 
> I did a quick look at the code and it looks like aio_read() is carrying a
> ref for the AioComplete for the entire duration of the function, so it
> should not be disappearing (and taking the AioRead request struct with it)
> until well after where the invalid read is.  Maybe there is an error path
> somewhere what is dropping a ref it shouldn't?
> 

I'll see if I can find a way to track that. It's the c->get() and c->put() that track this right?
 
The crash seems a little bit different every time, so it could still be something stomping on memory, eg overwriting the ref count or something.

Thanks

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux