Re: Issues with exclusive-lock code on testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 2, 2016 at 2:59 PM, Douglas Fuller <dfuller@xxxxxxxxxx> wrote:
>
>> On Mar 2, 2016, at 6:26 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>
>> Hi Doug,
>>
>> The cause of that memory corruption is a premature (duplicate, too)
>> call to rbd_obj_request_complete() in the !object-map DELETE case.
>> You've got:
>>
>>    <dispatch>
>>    rbd_osd_req_callback
>>      rbd_osd_delete_callback
>>        rbd_osd_discard_callback
>>        rbd_obj_request_complete
>>          <complete obj_request->completion>
>>                                                <waiter is woken up>
>>                                                ...
>>                                                rbd_obj_request_put
>>                                                <obj_request is gone>
>>    <do more things with obj_request> <- !!!
>>    rbd_obj_request_complete
>>      <complete obj_request->completion>
>
> Ah, yes, that is left over from before when I used the delete callback explicitly (assigning it to r_callback) instead of routing it through rbd_osd_req_callback like everything else. That likely explains it, thanks.
>
>> I also spotted two memory leaks on the NOTIFY_COMPLETE path in
>> __do_event().  The event one is trivial,
>
> We cannot free the event here because CEPH_WATCH_EVENT_NOTIFY_COMPLETE fills values needed by the caller that may be waiting on it. The waiter has to call ceph_osdc_cancel_event when done with it.

My ceph_osdc_put_event() doesn't free the event, it just puts a ref
that was acquired above with get_event().

>
>> the page vector one I have
>> a question about.  The data item is allocated in alloc_msg() and the
>> actual buffer is then passed into __do_event() and eventually into
>> rbd_send_async_notify(), but not further up the stack.  Is anything
>> going to use it?  If not, we should remove it entirely.
>
> The incoming message may contain the data item. In the case of CEPH_WATCH_EVENT_NOTIFY_COMPLETE, rbd does not expect any data from the OSD, only the return code. The protocol allows it to be present, though, so osd_client should handle this case.

If there won't be any users of it on the kernel client side, there's no
point in allocating the data item, extracting that data and propagating
it half way up, only to discard the data and free the data item.

>
>> Another thing that caught my eye is your diff adds a bunch of
>> ceph_get_snap_context() calls on header.snapc with no corresponding
>> puts.  My understanding is the ones around rbd_image_request_fill() are
>> there to workaround the fact that rbd_queue_workfn() isn't used,
>
> Yes. I think I left a comment there, but I’ll add one if I did not.
>
>> but
>> the one in rbd_obj_delete_sync() is immediately followed by
>> ceph_osdc_build_request() which bumps snapc and so is almost certainly
>> a leak.
>
> Yes. I thought you promised not to look at the garbage/junk commits, where that line is.

I looked at the entire diff - it was blowing past 1G of memory, after
all ;)

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux