Hi Doug, The cause of that memory corruption is a premature (duplicate, too) call to rbd_obj_request_complete() in the !object-map DELETE case. You've got: <dispatch> rbd_osd_req_callback rbd_osd_delete_callback rbd_osd_discard_callback rbd_obj_request_complete <complete obj_request->completion> <waiter is woken up> ... rbd_obj_request_put <obj_request is gone> <do more things with obj_request> <- !!! rbd_obj_request_complete <complete obj_request->completion> I also spotted two memory leaks on the NOTIFY_COMPLETE path in __do_event(). The event one is trivial, the page vector one I have a question about. The data item is allocated in alloc_msg() and the actual buffer is then passed into __do_event() and eventually into rbd_send_async_notify(), but not further up the stack. Is anything going to use it? If not, we should remove it entirely. Another thing that caught my eye is your diff adds a bunch of ceph_get_snap_context() calls on header.snapc with no corresponding puts. My understanding is the ones around rbd_image_request_fill() are there to workaround the fact that rbd_queue_workfn() isn't used, but the one in rbd_obj_delete_sync() is immediately followed by ceph_osdc_build_request() which bumps snapc and so is almost certainly a leak. The attached patch fixes the use-after-free and plugs those leaks. With it applied your test loop runs fine for me - no crashes or out of memory problems. Thanks, Ilya
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 92c354256055..c0198b6ca605 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1930,14 +1930,14 @@ static void rbd_osd_delete_callback(struct rbd_obj_request *obj_request) u8 current_state; if (!obj_request->img_request) { - rbd_osd_complete_delete(obj_request); + rbd_osd_discard_callback(obj_request); return; } rbd_dev = obj_request->img_request->rbd_dev; if (!rbd_use_object_map(rbd_dev)) { - rbd_osd_complete_delete(obj_request); + rbd_osd_discard_callback(obj_request); return; } @@ -3632,10 +3632,13 @@ static int rbd_send_async_notify(struct rbd_device *rbd_dev, } completed = ceph_osdc_wait_event(osdc, notify_event); - if (!completed) + if (!completed) { ret = -ETIMEDOUT; - else + } else { ret = notify_event->notify.return_code; + ceph_release_page_vector(notify_event->notify.notify_data, + calc_pages_for(0, notify_event->notify.notify_data_len)); + } cancel_event: ceph_osdc_cancel_event(notify_event); @@ -4828,7 +4831,6 @@ static int rbd_obj_delete_sync(struct rbd_device *rbd_dev, //obj_request->osd_req->r_priv = obj_request; - ceph_get_snap_context(rbd_dev->header.snapc); osd_req_op_init(obj_request->osd_req, 0, CEPH_OSD_OP_DELETE, 0); rbd_osd_req_format_snap_write(obj_request, rbd_dev->header.snapc); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 8316a304af63..12841c5a09c7 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2942,6 +2942,7 @@ static void __do_event(struct ceph_osd_client *osdc, u8 opcode, event->osd_req = NULL; } complete_all(&event->notify.complete); + ceph_osdc_put_event(event); } break; default: