Re: Issues with exclusive-lock code on testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> 
> Hi Doug,
> 
> The cause of that memory corruption is a premature (duplicate, too)
> call to rbd_obj_request_complete() in the !object-map DELETE case.
> You've got:
> 
>    <dispatch>
>    rbd_osd_req_callback
>      rbd_osd_delete_callback
>        rbd_osd_discard_callback
>        rbd_obj_request_complete
>          <complete obj_request->completion>
>                                                <waiter is woken up>
>                                                ...
>                                                rbd_obj_request_put
>                                                <obj_request is gone>
>    <do more things with obj_request> <- !!!
>    rbd_obj_request_complete
>      <complete obj_request->completion>

Ah, yes, that is left over from before when I used the delete callback explicitly (assigning it to r_callback) instead of routing it through rbd_osd_req_callback like everything else. That likely explains it, thanks.

> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
> __do_event().  The event one is trivial,

We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.

> the page vector one I have
> a question about.  The data item is allocated in alloc_msg() and the
> actual buffer is then passed into __do_event() and eventually into
> rbd_send_async_notify(), but not further up the stack.  Is anything
> going to use it?  If not, we should remove it entirely.

The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.

> Another thing that caught my eye is your diff adds a bunch of
> ceph_get_snap_context() calls on header.snapc with no corresponding
> puts.  My understanding is the ones around rbd_image_request_fill() are
> there to workaround the fact that rbd_queue_workfn() isn't used,

Yes. I think I left a comment there, but I’ll add one if I did not.

> but
> the one in rbd_obj_delete_sync() is immediately followed by
> ceph_osdc_build_request() which bumps snapc and so is almost certainly
> a leak.

Yes. I thought you promised not to look at the garbage/junk commits, where that line is.

> The attached patch fixes the use-after-free and plugs those leaks.
> With it applied your test loop runs fine for me - no crashes or out of
> memory problems.
> 
> Thanks,
> 
>                Ilya
> <doug-testing.diff>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux