Hello
I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically,
radosgw process on those machines starts consuming 100% of 5 CPU cores
for days at a time, even though the machine is not being used for data
transfers (nothing in radosgw logs, couple of KB/s of network).
This situation can affect any number of our rados gateways, lasts from
few hours to few days and stops if radosgw process is restarted or on
its own.
Does anybody have an idea what might be going on or how to debug it? I
don't see anything obvious in the logs. Perf top is saying that CPU is
consumed by radosgw shared object in symbol get_obj_data::flush, which,
if I interpret things correctly, is called from a symbol with a long
name that contains the substring "boost9intrusive9list_impl"
This is our configuration:
rgw_frontends = civetweb num_threads=5000 port=443s
ssl_certificate=/etc/ceph/rgw.crt
error_log_file=/var/log/ceph/civetweb.error.log
(error log file doesn't exist)
Thanks,
Vlad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com