I tried this one in ceph-user, didn't get any response, trying it here again after a slight edit: --- Hello, We recently upgraded from Luminous to Nautilus, after the upgrade, we are seeing this sporadic "lock-up" behavior on the RGW side. What I noticed from the log is that it seems to coincide with rgw realm reloader. What we are seeing is that realm reloader tries to pause frontends for that time period RGW is completely locked up, unable to take new requests. I believe this is an expected behavior. But why does rgw realm reloader keep triggering? is there a way to disable it or reduce frequency? We are not using multi-site feature (although we have a default realm) and we don't change our realm config at all. I captured the log below to see to capture anything with 'watch', Anyone? -- 2019-09-19 18:03:23.245 7f0bd5f5f700 1 rgw realm reloader: Resuming frontends with new realm configuration. 2019-09-19 18:03:23.245 7f2bd8f5d700 1 ====== starting new request req=0x7f2bd8f56950 ===== 2019-09-19 18:03:23.245 7f2bd2750700 1 ====== starting new request req=0x7f2bd2749950 ===== 2019-09-19 18:03:23.245 7f2bcaf41700 1 ====== starting new request req=0x7f2bcaf3a950 ===== 2019-09-19 18:03:23.245 7f2bd074c700 1 ====== starting new request req=0x7f2bd0745950 ===== 2019-09-19 18:03:23.245 7f2bc6f39700 1 ====== starting new request req=0x7f2bc6f32950 ===== 2019-09-19 18:03:23.245 7f2bd5756700 1 ====== starting new request req=0x7f2bd574f950 ===== 2019-09-19 18:03:23.245 7f2bc4f35700 1 ====== starting new request req=0x7f2bc4f2e950 ===== -- 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj verifying op params 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj pre-executing 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj executing 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj completing 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj op status=0 2019-09-19 18:05:41.588 7f2bd2750700 2 req 121303 0.001s s3:get_obj http status=200 2019-09-19 18:05:41.588 7f2bd2750700 1 ====== req done req=0x7f2bd2749950 op status=0 http_status=200 latency=0.001s ====== 2019-09-19 18:05:41.588 7f2bd2750700 1 civetweb: 0x7f2c36d6f3a8: 168.245.88.23 - - [19/Sep/2019:18:05:39 +0000] "GET /kamta-incoming/filter0190p3mdw1-28304-5D83C374-LIfsCi4hQiGpa_pXeRjZ-A HTTP/1.1" 200 17578 - Minio (linux; amd64) minio-go/v6.0.17 2019-09-19 18:05:41.589 7f0bd1f57700 4 rgw period pusher: No zones to update 2019-09-19 18:05:41.589 7f0bd1f57700 4 rgw realm reloader: Notification on realm, reconfiguration scheduled 2019-09-19 18:05:41.589 7f0bd5f5f700 1 rgw realm reloader: Pausing frontends for realm update... 2019-09-19 18:05:41.590 7f2bd4754700 2 req 121302 0.003s s3:put_obj completing 2019-09-19 18:05:41.590 7f2bd4754700 2 req 121302 0.003s s3:put_obj op status=0 2019-09-19 18:05:41.590 7f2bd4754700 2 req 121302 0.003s s3:put_obj http status=200 2019-09-19 18:05:41.590 7f2bd4754700 1 ====== req done req=0x7f2bd474d950 op status=0 http_status=200 latency=0.003s ====== 2019-09-19 18:05:41.590 7f0bd5f5f700 4 rgw period pusher: paused for realm update 2019-09-19 18:05:41.590 7f2bd4754700 1 civetweb: 0x7f2c36d6cc48: 10.25.78.148 - - [19/Sep/2019:18:05:37 +0000] "PUT /mta-incoming/filter0187p3mdw1-18517-5D83C374-4D-50-15C5E919A6F32260 HTTP/1.1" 200 216 - Minio (linux; amd64) minio-go/v6.0.17 2019-09-19 18:05:41.590 7f0bd5f5f700 1 rgw realm reloader: Frontends paused 2019-09-19 18:05:41.590 7f2bdf76a700 5 completion_mgr.get_next() returned ret=-125 2019-09-19 18:05:41.590 7f2bdf76a700 5 run(): was stopped, exiting 2019-09-19 18:05:41.606 7f0bd5f5f700 2 removed watcher, disabling cache 2019-09-19 18:05:41.646 7f0bd5f5f700 1 rgw realm reloader: Store closed 2019-09-19 18:05:42.173 7f0bd5f5f700 2 all 8 watchers are set, enabling cache 2019-09-19 18:05:42.217 7f2bf4f95700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-09-19 18:05:42.218 7f2bf4794700 2 garbage collection: garbage collection: start 2019-09-19 18:05:42.218 7f2bf3f93700 2 object expiration: start 2019-09-19 18:05:42.226 7f2bf3f93700 5 process_single_shard(): failed to acquire lock on obj_delete_at_hint.0000000001 2019-09-19 18:05:42.226 7f2bdd766700 2 lifecycle: life cycle: start 2019-09-19 18:05:42.226 7f2bdc764700 5 ERROR: sync_all_users() returned ret=-2 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: Creating new store 2019-09-19 18:05:42.227 7f0bd5f5f700 1 mgrc service_daemon_register rgw.radosgw.gateway metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable),ceph_version_short=14.2.2,cpu=Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=civetweb port=50680 num_threads=16384 error_log_file=/var/log/radosgw/civetweb.error.log access_log_file=/var/log/radosgw/civetweb.access.log,frontend_type#0=civetweb,hostname=cephrgw0026p3mdw1.sendgrid.net,kernel_description=#1 SMP Tue Nov 22 16:42:41 UTC 2016,kernel_version=3.10.0-514.el7.x86_64,mem_swap_kb=12582908,mem_total_kb=197802260,num_handles=5,os=Linux,pid=156587,zone_id=6ee46730-8dcd-4d15-aaa2-4e9be2a11096,zone_name=us-west-1-zone,zonegroup_id=9eeec9c9-fcba-445d-a913-6069617524d1,zonegroup_name=us-west-1} 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: Finishing initialization of new store 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: - REST subsystem init 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: - user subsystem init 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: - user subsystem init 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: - usage subsystem init 2019-09-19 18:05:42.227 7f0bd5f5f700 1 rgw realm reloader: Resuming frontends with new realm configuration. 2019-09-19 18:05:42.227 7f2bd174e700 1 ====== starting new request req=0x7f2bd1747950 ===== _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx