Re: radosgw pegging down 5 CPU cores when no data is being transferred

Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> · Mon, 26 Aug 2019 13:50:20 -0500

I created a ticket: https://tracker.ceph.com/issues/41511

Note that I think I was mistaken when I said that sometimes the problem 
goes away on its own. I've looked back through our monitoring and it 
looks like when the problem did go away, it was because either the 
machine was rebooted or the radosgw service was restarted.

Vlad

On 8/23/19 10:17 AM, Eric Ivancich wrote:
Good morning, Vladimir,

Please create a tracker for this 
(https://tracker.ceph.com/projects/rgw/issues/new) and include the link 
to it in an email reply. And if you can include any more potentially 
relevant details, please do so. I’ll add my initial analysis to it.

But the threads do seem to be stuck, at least for a while, in 
get_obj_data::flush despite a lack of traffic. And sometimes it 
self-resolves, so it’s not a true “infinite loop”.

Thank you,

Eric

On Aug 22, 2019, at 9:12 PM, Eric Ivancich <ivancich@xxxxxxxxxx 
<mailto:ivancich@xxxxxxxxxx>> wrote:

Thank you for providing the profiling data, Vladimir. There are 5078 
threads and most of them are waiting. Here is a list of the deepest 
call of each thread with duplicates removed.

            + 100.00% epoll_wait
                          + 100.00% 
get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)
            + 100.00% poll
        + 100.00% poll
      + 100.00% poll
        + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
      + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
        + 100.00% pthread_cond_wait@@GLIBC_2.3.2
      + 100.00% pthread_cond_wait@@GLIBC_2.3.2
      + 100.00% read
                            + 
100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JEEEEUlRKT_RKT0_E_EEvSH_

The only interesting ones are the second and last:

* get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)
* 
_ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JEEEEUlRKT_RKT0_E_EEvSH_

They are essentially part of the same call stack that results from 
processing a GetObj request, and five threads are in this call stack 
(the only difference is wether or not they include the call into boost 
intrusive list). Here’s the full call stack of those threads:

+ 100.00% clone
  + 100.00% start_thread
    + 100.00% worker_thread
      + 100.00% process_new_connection
        + 100.00% handle_request
          + 100.00% RGWCivetWebFrontend::process(mg_connection*)
            + 100.00% process_request(RGWRados*, RGWREST*, 
RGWRequest*, std::string const&, rgw::auth::StrategyRegistry const&, 
RGWRestfulIO*, OpsLogSocket*, opt
ional_yield, rgw::dmclock::Scheduler*, int*)
              + 100.00% rgw_process_authenticated(RGWHandler_REST*, 
RGWOp*&, RGWRequest*, req_state*, bool)
                + 100.00% RGWGetObj::execute()
                  + 100.00% RGWRados::Object::Read::iterate(long, 
long, RGWGetDataCB*)
                    + 100.00% RGWRados::iterate_obj(RGWObjectCtx&, 
RGWBucketInfo const&, rgw_obj const&, long, long, unsigned long, int 
(*)(rgw_raw_obj const&, l
ong, long, long, bool, RGWObjState*, void*), void*)
                      + 100.00% _get_obj_iterate_cb(rgw_raw_obj 
const&, long, long, long, bool, RGWObjState*, void*)
                        + 100.00% 
RGWRados::get_obj_iterate_cb(rgw_raw_obj const&, long, long, long, 
bool, RGWObjState*, void*)
                          + 100.00% 
get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)
                            + 
100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JEEEEUlRKT_RKT0_E_EEvSH_

So this isn’t background processing but request processing. I’m not 
clear why these requests are consuming so much CPU for so long.

From your initial message:
I am running a Ceph 14.2.1 cluster with 3 rados gateways. 
Periodically, radosgw process on those machines starts consuming 100% 
of 5 CPU cores for days at a time, even though the machine is not 
being used for data transfers (nothing in radosgw logs, couple of 
KB/s of network).

This situation can affect any number of our rados gateways, lasts 
from few hours to few days and stops if radosgw process is restarted 
or on its own.

I’m going to check with others who’re more familiar with this code path.

Begin forwarded message:

*From:*Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx 
<mailto:vladimir.brik@xxxxxxxxxxxxxxxx>>
*Subject:**Re:  radosgw pegging down 5 CPU cores when no 
data is being transferred*
*Date:*August 21, 2019 at 4:47:01 PM EDT
*To:*"J. Eric Ivancich" <ivancich@xxxxxxxxxx 
<mailto:ivancich@xxxxxxxxxx>>, Mark Nelson <mnelson@xxxxxxxxxx 
<mailto:mnelson@xxxxxxxxxx>>,ceph-users@xxxxxxxxxxxxxx 
<mailto:ceph-users@xxxxxxxxxxxxxx>

> Are you running multisite?
No

> Do you have dynamic bucket resharding turned on?
Yes. "radosgw-admin reshard list" prints "[]"

> Are you using lifecycle?
I am not sure. How can I check? "radosgw-admin lc list" says "[]"

> And just to be clear -- sometimes all 3 of your rados gateways are
> simultaneously in this state?
Multiple, but I have not seen all 3 being in this state 
simultaneously. Currently one gateway has 1 thread using 100% of CPU, 
and another has 5 threads each using 100% CPU.

Here are the fruits of my attempts to capture the call graph using 
perf and gdbpmp:
https://icecube.wisc.edu/~vbrik/perf.data
https://icecube.wisc.edu/~vbrik/gdbpmp.data

These are the commands that I ran and their outputs (note I couldn't 
get perf not to generate the warning):
rgw-3 gdbpmp # ./gdbpmp.py -n 100 -p 73688 -o gdbpmp.data
Attaching to process 73688...Done.
Gathering 
Samples....................................................................................................
Profiling complete with 100 samples.

rgw-3 ~ # perf record --call-graph fp -p 73688 -- sleep 10
[ perf record: Woken up 54 times to write data ]
Warning:
Processed 574207 events and lost 4 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 58.866 MB perf.data (233750 samples) ]

Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com