playing with MULTI-SITE zones for CEPH Object Gateway ceph version: 17.2.5 my setup: 3 zone multi-site; 3-way full sync mode; each zone has 3 machines -> RGW+MON+OSD running load test: 3000 concurrent uploads of 1M object after about 3-4 minutes of load RGW machine get stuck, on 2 zone out of 3 RGW is not responding (e.g. curl $RGW:80) attempt to restart RGW ends up with `Initialization timeout, failed to initialize` here is a backtrace from gdb with a backtrace where it hangs after restart: (gdb) inf thr Id Target Id Frame * 1 Thread 0x7fa7d3abbcc0 (LWP 30791) "radosgw" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183 ... (gdb) bt #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffc7f7a2488, cond=0x7ffc7f7a2410) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x7ffc7f7a2410, mutex=0x7ffc7f7a2488) at pthread_cond_wait.c:647 #3 0x00007fa7d7097e42 in ceph::condition_variable_debug::wait (this=this@entry=0x7ffc7f7a2410, lock=...) at ../src/common/mutex_debug.h:148 #4 0x00007fa7d7953cba in ceph::condition_variable_debug::wait<librados::IoCtxImpl::operate(const object_t&, ObjectOperation*, ceph::real_time*, int)::<lambda()> > (pred=..., lock=..., this=0x7ffc7f7a2410) at ../src/librados/IoCtxImpl.cc:672 #5 librados::IoCtxImpl::operate (this=this@entry=0x558347c21010, oid=..., o=0x558347e12310, pmtime=<optimized out>, flags=<optimized out>) at ../src/librados/IoCtxImpl.cc:672 #6 0x00007fa7d792bd55 in librados::v14_2_0::IoCtx::operate (this=this@entry=0x558347e44760, oid="notify.0", o=o@entry=0x7ffc7f7a2690, flags=flags@entry=0) at ../src/librados/librados_cxx.cc:1536 #7 0x00007fa7d9490ad1 in rgw_rados_operate (dpp=<optimized out>, ioctx=..., oid="notify.0", op=op@entry=0x7ffc7f7a2690, y=..., flags=0) at ../src/rgw/rgw_tools.cc:277 #8 0x00007fa7d9627e0f in RGWSI_RADOS::Obj::operate (this=this@entry=0x558347e44710, dpp=<optimized out>, op=op@entry=0x7ffc7f7a2690, y=..., flags=flags@entry=0) at ../src/rgw/services/svc_rados.h:112 #9 0x00007fa7d96209a5 in RGWSI_Notify::init_watch (this=this@entry=0x558347c49530, dpp=<optimized out>, y=...) at ../src/rgw/services/svc_notify.cc:214 #10 0x00007fa7d962161b in RGWSI_Notify::do_start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/services/svc_notify.cc:277 #11 0x00007fa7d8f17bcf in RGWServiceInstance::start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:331 #12 0x00007fa7d8f1a260 in RGWServices_Def::init (this=this@entry=0x558347de90a0, cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360 #13 0x00007fa7d8f1cc40 in RGWServices::do_init (this=this@entry=0x558347de90a0, _cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:284 #14 0x00007fa7d92a7b1f in RGWServices::init (dpp=<optimized out>, y=..., run_sync=<optimized out>, have_cache=<optimized out>, cct=<optimized out>, this=0x558347de90a0) at ../src/rgw/rgw_service.h:153 #15 RGWRados::init_svc (this=this@entry=0x558347de8dc0, raw=raw@entry=false, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1380 #16 0x00007fa7d930f241 in RGWRados::initialize (this=0x558347de8dc0, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1400 #17 0x00007fa7d944f85f in RGWRados::initialize (dpp=<optimized out>, _cct=0x558347c6a320, this=<optimized out>) at ../src/rgw/rgw_rados.h:586 #18 StoreManager::init_storage_provider (dpp=<optimized out>, dpp@entry=0x7ffc7f7a2e90, cct=cct@entry=0x558347c6a320, svc="rados", use_gc_thread=use_gc_thread@entry=true, use_lc_thread=use_lc_thread@entry=true, quota_threads=quota_threads@entry=true, run_sync_thread=true, run_reshard_thread=true, use_cache=true, use_gc=true) at ../src/rgw/rgw_sal.cc:55 #19 0x00007fa7d8e7367a in StoreManager::get_storage (use_gc=true, use_cache=true, run_reshard_thread=true, run_sync_thread=true, quota_threads=true, use_lc_thread=true, use_gc_thread=true, svc="rados", cct=0x558347c6a320, dpp=0x7ffc7f7a2e90) at /usr/include/c++/9/bits/basic_string.h:267 #20 radosgw_Main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/rgw_main.cc:372 #21 0x0000558347883f56 in main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/radosgw.cc:12 (gdb) #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffc7f7a2488, cond=0x7ffc7f7a2410) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x7ffc7f7a2410, mutex=0x7ffc7f7a2488) at pthread_cond_wait.c:647 #3 0x00007fa7d7097e42 in ceph::condition_variable_debug::wait (this=this@entry=0x7ffc7f7a2410, lock=...) at ../src/common/mutex_debug.h:148 #4 0x00007fa7d7953cba in ceph::condition_variable_debug::wait<librados::IoCtxImpl::operate(const object_t&, ObjectOperation*, ceph::real_time*, int)::<lambda()> > (pred=..., lock=..., this=0x7ffc7f7a2410) at ../src/librados/IoCtxImpl.cc:672 #5 librados::IoCtxImpl::operate (this=this@entry=0x558347c21010, oid=..., o=0x558347e12310, pmtime=<optimized out>, flags=<optimized out>) at ../src/librados/IoCtxImpl.cc:672 #6 0x00007fa7d792bd55 in librados::v14_2_0::IoCtx::operate (this=this@entry=0x558347e44760, oid="notify.0", o=o@entry=0x7ffc7f7a2690, flags=flags@entry=0) at ../src/librados/librados_cxx.cc:1536 #7 0x00007fa7d9490ad1 in rgw_rados_operate (dpp=<optimized out>, ioctx=..., oid="notify.0", op=op@entry=0x7ffc7f7a2690, y=..., flags=0) at ../src/rgw/rgw_tools.cc:277 #8 0x00007fa7d9627e0f in RGWSI_RADOS::Obj::operate (this=this@entry=0x558347e44710, dpp=<optimized out>, op=op@entry=0x7ffc7f7a2690, y=..., flags=flags@entry=0) at ../src/rgw/services/svc_rados.h:112 #9 0x00007fa7d96209a5 in RGWSI_Notify::init_watch (this=this@entry=0x558347c49530, dpp=<optimized out>, y=...) at ../src/rgw/services/svc_notify.cc:214 #10 0x00007fa7d962161b in RGWSI_Notify::do_start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/services/svc_notify.cc:277 #11 0x00007fa7d8f17bcf in RGWServiceInstance::start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:331 #12 0x00007fa7d8f1a260 in RGWServices_Def::init (this=this@entry=0x558347de90a0, cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360 #13 0x00007fa7d8f1cc40 in RGWServices::do_init (this=this@entry=0x558347de90a0, _cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:284 #14 0x00007fa7d92a7b1f in RGWServices::init (dpp=<optimized out>, y=..., run_sync=<optimized out>, have_cache=<optimized out>, cct=<optimized out>, this=0x558347de90a0) at ../src/rgw/rgw_service.h:153 #15 RGWRados::init_svc (this=this@entry=0x558347de8dc0, raw=raw@entry=false, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1380 #16 0x00007fa7d930f241 in RGWRados::initialize (this=0x558347de8dc0, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1400 #17 0x00007fa7d944f85f in RGWRados::initialize (dpp=<optimized out>, _cct=0x558347c6a320, this=<optimized out>) at ../src/rgw/rgw_rados.h:586 #18 StoreManager::init_storage_provider (dpp=<optimized out>, dpp@entry=0x7ffc7f7a2e90, cct=cct@entry=0x558347c6a320, svc="rados", use_gc_thread=use_gc_thread@entry=true, use_lc_thread=use_lc_thread@entry=true, quota_threads=quota_threads@entry=true, run_sync_thread=true, run_reshard_thread=true, use_cache=true, use_gc=true) at ../src/rgw/rgw_sal.cc:55 #19 0x00007fa7d8e7367a in StoreManager::get_storage (use_gc=true, use_cache=true, run_reshard_thread=true, run_sync_thread=true, quota_threads=true, use_lc_thread=true, use_gc_thread=true, svc="rados", cct=0x558347c6a320, dpp=0x7ffc7f7a2e90) at /usr/include/c++/9/bits/basic_string.h:267 #20 radosgw_Main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/rgw_main.cc:372 #21 0x0000558347883f56 in main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/radosgw.cc:12 Any suggestion on what can be a problem and how to reset RGW so it will be able to start normally? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx