RGW hung, 2 OSDs using 100% CPU

florian@xxxxxxxxxxx (Florian Haas) · Wed, 17 Sep 2014 18:02:09 +0200

On Wed, Sep 17, 2014 at 5:42 PM, Dan Van Der Ster
<daniel.vanderster at cern.ch> wrote:
> From: Florian Haas <florian at hastexo.com>
> Sent: Sep 17, 2014 5:33 PM
> To: Dan Van Der Ster
> Cc: Craig Lewis <clewis at centraldesktop.com>;ceph-users at lists.ceph.com
> Subject: Re: RGW hung, 2 OSDs using 100% CPU
>
> On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster
> <daniel.vanderster at cern.ch> wrote:
>> Hi Florian,
>>
>>> On 17 Sep 2014, at 17:09, Florian Haas <florian at hastexo.com> wrote:
>>>
>>> Hi Craig,
>>>
>>> just dug this up in the list archives.
>>>
>>> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis <clewis at centraldesktop.com>
>>> wrote:
>>>> In the interest of removing variables, I removed all snapshots on all
>>>> pools,
>>>> then restarted all ceph daemons at the same time.  This brought up osd.8
>>>> as
>>>> well.
>>>
>>> So just to summarize this: your 100% CPU problem at the time went away
>>> after you removed all snapshots, and the actual cause of the issue was
>>> never found?
>>>
>>> I am seeing a similar issue now, and have filed
>>> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
>>> again. Can you take a look at that issue and let me know if anything
>>> in the description sounds familiar?
>>
>>
>> Could your ticket be related to the snap trimming issue I?ve finally
>> narrowed down in the past couple days?
>>
>>   http://tracker.ceph.com/issues/9487
>>
>> Bump up debug_osd to 20 then check the log during one of your incidents.
>> If it is busy logging the snap_trimmer messages, then it?s the same issue.
>> (The issue is that rbd pools have many purged_snaps, but sometimes after
>> backfilling a PG the purged_snaps list is lost and thus the snap trimmer
>> becomes very busy whilst re-trimming thousands of snaps. During that time (a
>> few minutes on my cluster) the OSD is blocked.)
>
> That sounds promising, thank you! debug_osd=10 should actually be
> sufficient as those snap_trim messages get logged at that level. :)
>
> Do I understand your issue report correctly in that you have found
> setting osd_snap_trim_sleep to be ineffective, because it's being
> applied when iterating from PG to PG, rather than from snap to snap?
> If so, then I'm guessing that that can hardly be intentional...
>
> Cheers,
> Florian
>
> Hi,
> (Sorry for top posting, mobile now).

I've taken the liberty to reformat. :)

> That's exactly what I observe -- one sleep per PG. The problem is that the
> sleep can't simply be moved since AFAICT the whole PG is locked for the
> duration of the trimmer. So the options I proposed are to limit the number
> of snaps trimmed per call to e.g 16, or to fix the loss of purged_snaps
> after backfilling. Actually, probably both of those are needed. But a real
> dev would know better.

Okay. Certainly worth a try. Thanks again! I'll let you know when I know more.

Cheers,
Florian