On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> wrote: > > Hello > > I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, > radosgw process on those machines starts consuming 100% of 5 CPU cores > for days at a time, even though the machine is not being used for data > transfers (nothing in radosgw logs, couple of KB/s of network). > > This situation can affect any number of our rados gateways, lasts from > few hours to few days and stops if radosgw process is restarted or on > its own. > > Does anybody have an idea what might be going on or how to debug it? I > don't see anything obvious in the logs. Perf top is saying that CPU is > consumed by radosgw shared object in symbol get_obj_data::flush, which, > if I interpret things correctly, is called from a symbol with a long > name that contains the substring "boost9intrusive9list_impl" > > This is our configuration: > rgw_frontends = civetweb num_threads=5000 port=443s > ssl_certificate=/etc/ceph/rgw.crt > error_log_file=/var/log/ceph/civetweb.error.log Probably unrelated to your problem, but running with lots of threads is usually an indicator that the async beast frontend would be a better fit for your setup. (But the code you see in perf should not be related to the frontend) Paul > > (error log file doesn't exist) > > > Thanks, > > Vlad > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com