Re: Memory leak in radosgw

Ken Dreyer <kdreyer@xxxxxxxxxx> · Mon, 24 Oct 2016 12:29:30 -0600

Hi Trey,

If you run the upstream curl releases, please note that curl has a
poor security record and it's important to stay on top of updates.
https://curl.haxx.se/docs/security.html indicates that 7.44 has
security problems, and in fact there are eleven more security
announcements coming soon
(https://curl.haxx.se/mail/lib-2016-10/0076.html)

If you could provide us more information about the memory leak you're
seeing, we can coordinate that with the curl maintainers in RHEL and
see if it's feasible to get a fix into RHEL's 7.29.

- Ken

On Mon, Oct 24, 2016 at 10:31 AM, Trey Palmer <trey@xxxxxxxxxxxxx> wrote:
> Updating to libcurl 7.44 fixed the memory leak issue.   Thanks for the tip,
> Ben.
>
> FWIW this was a massive memory leak, it rendered the system untenable in my
> testing.   RGW multisite will flat not work with the current CentOS/RHEL7
> libcurl.
>
> Seems like there are a lot of different problems caused by libcurl
> bugs/incompatibilities.
>
>    -- Trey
>
> On Fri, Oct 21, 2016 at 11:04 AM, Trey Palmer <trey@xxxxxxxxxxxxx> wrote:
>>
>> Hi Ben,
>>
>> I previously hit this bug:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1327142
>>
>> So I updated from libcurl 7.29.0-25 to the new update package libcurl
>> 7.29.0-32 on RHEL 7, which fixed the deadlock problem.
>>
>> I had not seen the issue you linked.   It doesn't seem directly related,
>> since my problem is a memory leak and not CPU.   Clearly, though, older
>> libcurl versions remain problematic for multiple reasons, so I'll give a
>> newer one a try.
>>
>> Thanks for the input!
>>
>>    -- Trey
>>
>>
>>
>> On Fri, Oct 21, 2016 at 3:21 AM, Ben Morrice <ben.morrice@xxxxxxx> wrote:
>>>
>>> What version of libcurl are you using?
>>>
>>> I was hitting this bug with RHEL7/libcurl 7.29 which could also be your
>>> catalyst.
>>>
>>> http://tracker.ceph.com/issues/15915
>>>
>>> Kind regards,
>>>
>>> Ben Morrice
>>>
>>> ______________________________________________________________________
>>> Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
>>> EPFL ENT CBS BBP
>>> Biotech Campus
>>> Chemin des Mines 9
>>> 1202 Geneva
>>> Switzerland
>>>
>>> On 20/10/16 21:41, Trey Palmer wrote:
>>>
>>> I've been trying to test radosgw multisite and have a pretty bad memory
>>> leak.    It appears to be associated only with multisite sync.
>>>
>>> Multisite works well for a small numbers of objects.    However, it all
>>> fell over when I wrote in 8M 64K objects to two buckets overnight for
>>> testing (via cosbench).
>>>
>>> The leak appears to happen on the multisite transfer source -- that is,
>>> the
>>> node where the objects were written originally.   The radosgw process
>>> eventually dies, I'm sure via the OOM killer, and systemd restarts it.
>>> Then repeat, though multisite sync pretty much stops at that point.
>>>
>>> I have tried 10.2.2, 10.2.3 and a combination of the two.   I'm running
>>> on
>>> CentOS 7.2, using civetweb with SSL.   I saw that the memory profiler
>>> only
>>> works on mon, osd and mds processes.
>>>
>>> Anyone else seen anything like this?
>>>
>>>    -- Trey
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com