Re: radosgw pegging down 5 CPU cores when no data is being transferred

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 21 Aug 2019 16:26:56 +0200

On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
> Hello
>
> I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically,
> radosgw process on those machines starts consuming 100% of 5 CPU cores
> for days at a time, even though the machine is not being used for data
> transfers (nothing in radosgw logs, couple of KB/s of network).
>
> This situation can affect any number of our rados gateways, lasts from
> few hours to few days and stops if radosgw process is restarted or on
> its own.
>
> Does anybody have an idea what might be going on or how to debug it? I
> don't see anything obvious in the logs. Perf top is saying that CPU is
> consumed by radosgw shared object in symbol get_obj_data::flush, which,
> if I interpret things correctly, is called from a symbol with a long
> name that contains the substring "boost9intrusive9list_impl"
>
> This is our configuration:
> rgw_frontends = civetweb num_threads=5000 port=443s
> ssl_certificate=/etc/ceph/rgw.crt
> error_log_file=/var/log/ceph/civetweb.error.log

Probably unrelated to your problem, but running with lots of threads
is usually an indicator that the async beast frontend would be a
better fit for your setup.
(But the code you see in perf should not be related to the frontend)

Paul

>
> (error log file doesn't exist)
>
>
> Thanks,
>
> Vlad
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com