> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > Hi Doug, > > The cause of that memory corruption is a premature (duplicate, too) > call to rbd_obj_request_complete() in the !object-map DELETE case. > You've got: > > <dispatch> > rbd_osd_req_callback > rbd_osd_delete_callback > rbd_osd_discard_callback > rbd_obj_request_complete > <complete obj_request->completion> > <waiter is woken up> > ... > rbd_obj_request_put > <obj_request is gone> > <do more things with obj_request> <- !!! > rbd_obj_request_complete > <complete obj_request->completion> Ah, yes, that is left over from before when I used the delete callback explicitly (assigning it to r_callback) instead of routing it through rbd_osd_req_callback like everything else. That likely explains it, thanks. > I also spotted two memory leaks on the NOTIFY_COMPLETE path in > __do_event(). The event one is trivial, We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it. > the page vector one I have > a question about. The data item is allocated in alloc_msg() and the > actual buffer is then passed into __do_event() and eventually into > rbd_send_async_notify(), but not further up the stack. Is anything > going to use it? If not, we should remove it entirely. The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case. > Another thing that caught my eye is your diff adds a bunch of > ceph_get_snap_context() calls on header.snapc with no corresponding > puts. My understanding is the ones around rbd_image_request_fill() are > there to workaround the fact that rbd_queue_workfn() isn't used, Yes. I think I left a comment there, but I’ll add one if I did not. > but > the one in rbd_obj_delete_sync() is immediately followed by > ceph_osdc_build_request() which bumps snapc and so is almost certainly > a leak. Yes. I thought you promised not to look at the garbage/junk commits, where that line is. > The attached patch fixes the use-after-free and plugs those leaks. > With it applied your test loop runs fine for me - no crashes or out of > memory problems. > > Thanks, > > Ilya > <doug-testing.diff> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html