Re: ceph-osd pegging CPU on giant, no snapshots involved this time

"Adolfo R. Brandes" <adolfo.brandes@xxxxxxxxxxx> · Wed, 11 Mar 2015 18:55:09 -0400

On Wed, Feb 18, 2015 at 9:19 PM, Florian Haas wrote:
>> Hey everyone,
>>
>> I must confess I'm still not fully understanding this problem and
>> don't exactly know where to start digging deeper, but perhaps other
>> users have seen this and/or it rings a bell.
>>
>> System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2
>> different rulesets where the problem applies to hosts and PGs using a
>> bog-standard default crushmap.
>>
>> Symptom: out of the blue, ceph-osd processes on a single OSD node
>> start going to 100% CPU utilization. The problems turns so bad that
>> the machine is effectively becoming CPU bound and can't cope with any
>> client requests anymore. Stopping and restarting all OSDs brings the
>> problem right back, as does rebooting the machine — right after
>> ceph-osd processes start, CPU utilization shoots up again. Stopping
>> and marking out several OSDs on the machine makes the problem go away
>> but obviously causes massive backfilling. All the logs show while CPU
>> utilization is implausibly high are slow requests (which would be
>> expected in a system that can barely do anything).
>>
>> Now I've seen issues like this before on dumpling and firefly, but
>> besides the fact that they have all been addressed and should now be
>> fixed, they always involved the prior mass removal of RBD snapshots.
>> This system only used a handful of snapshots in testing, and is
>> presently not using any snapshots at all.
>>
>> I'll be spending some time looking for clues in the log files of the
>> OSDs that were shut down which caused the problem to go away, but if
>> this sounds familiar to anyone willing to offer clues, I'd be more
>> than interested. :) Thanks!
>>
>> Cheers,
>> Florian
>
>Dan vd Ster was kind enough to pitch in an incredibly helpful off-list
>reply, which I am taking the liberty to paraphrase here:
>
>That "mysterious" OSD madness seems to be caused by NUMA zone reclaim,
>which is enabled by default on Intel machines with recent kernels. It
>can be disabled as follows:
>
>echo 0 > /proc/sys/vm/zone_reclaim_mode
>
>or of course, "sysctl -w vm.zone_reclaim_mode=0" or the corresponding
>sysctl.conf entry.
>
>On the machines affected, that seems to have removed the CPU pegging
>issue, at least it has not reappeared for several days now.
>
>Dan and Sage have discussed the issue recently in this thread:
>http://www.spinics.net/lists/ceph-users/msg14914.html
>
>Thanks a million to Dan.

I'm looking into the original issue Florian describes above.  It seems
that unsetting zone_reclaim_mode wasn't the magical fix we hoped.  After
a couple of weeks, we're seeing pegged CPUs again, but his time we
managed to get a perf top snapshot of it happening.  These are the topmost
(ahem) lines:

8.33% [kernel] [k] _raw_spin_lock
3.14% perf [.] 0x00000000000da124
2.58% [unknown] [.] 0x00007f8a2901042d
1.85% libpython2.7.so.1.0 [.] 0x000000000006dac2
1.61% libc-2.17.so [.] __memcpy_ssse3_back
1.54% perf [.] dso__find_symbol
1.44% libc-2.17.so [.] __strcmp_sse42
1.41% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
1.25% [kernel] [k] native_write_msr_safe
1.24% perf [.] hists__output_resort
1.11% libleveldb.so.1.0.7 [.] 0x000000000003cde8
0.86% perf [.] perf_evsel__parse_sample
0.81% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
0.76% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
0.73% [kernel] [k] apic_timer_interrupt
0.71% [kernel] [k] page_fault
0.71% [kernel] [k] _raw_spin_lock_irqsave
0.62% libpthread-2.17.so [.] pthread_mutex_unlock
0.62% libc-2.17.so [.] __memcmp_sse4_1
0.61% libc-2.17.so [.] _int_malloc
0.60% perf [.] rb_next
0.58% [kernel] [k] clear_page_c_e
0.56% [kernel] [k] tg_load_down

The server in question was booted without any OSDs.  A few were started after
invoking 'perf top', during which run the CPUs were saturated.

Any ideas?

Cheers!
Adolfo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com