Initialization timeout, failed to initialize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



playing with MULTI-SITE zones for CEPH Object Gateway

ceph version: 17.2.5 
my setup: 3 zone multi-site; 3-way full sync mode; 
each zone has 3 machines -> RGW+MON+OSD
running load test:  3000 concurrent uploads of 1M object 

after about 3-4 minutes of load RGW machine get stuck, on 2 zone out of 3 RGW is not responding (e.g. curl $RGW:80) 
attempt to restart RGW ends up with `Initialization timeout, failed to initialize`

here is a backtrace from gdb with a backtrace where it hangs after restart:

(gdb) inf thr
  Id   Target Id                                           Frame
* 1    Thread 0x7fa7d3abbcc0 (LWP 30791) "radosgw"         futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183
...

(gdb) bt
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffc7f7a2488, cond=0x7ffc7f7a2410) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x7ffc7f7a2410, mutex=0x7ffc7f7a2488) at pthread_cond_wait.c:647
#3  0x00007fa7d7097e42 in ceph::condition_variable_debug::wait (this=this@entry=0x7ffc7f7a2410, lock=...) at ../src/common/mutex_debug.h:148
#4  0x00007fa7d7953cba in ceph::condition_variable_debug::wait<librados::IoCtxImpl::operate(const object_t&, ObjectOperation*, ceph::real_time*, int)::<lambda()> > (pred=..., lock=..., this=0x7ffc7f7a2410) at ../src/librados/IoCtxImpl.cc:672
#5  librados::IoCtxImpl::operate (this=this@entry=0x558347c21010, oid=..., o=0x558347e12310, pmtime=<optimized out>, flags=<optimized out>) at ../src/librados/IoCtxImpl.cc:672
#6  0x00007fa7d792bd55 in librados::v14_2_0::IoCtx::operate (this=this@entry=0x558347e44760, oid="notify.0", o=o@entry=0x7ffc7f7a2690, flags=flags@entry=0) at ../src/librados/librados_cxx.cc:1536
#7  0x00007fa7d9490ad1 in rgw_rados_operate (dpp=<optimized out>, ioctx=..., oid="notify.0", op=op@entry=0x7ffc7f7a2690, y=..., flags=0) at ../src/rgw/rgw_tools.cc:277
#8  0x00007fa7d9627e0f in RGWSI_RADOS::Obj::operate (this=this@entry=0x558347e44710, dpp=<optimized out>, op=op@entry=0x7ffc7f7a2690, y=..., flags=flags@entry=0) at ../src/rgw/services/svc_rados.h:112
#9  0x00007fa7d96209a5 in RGWSI_Notify::init_watch (this=this@entry=0x558347c49530, dpp=<optimized out>, y=...) at ../src/rgw/services/svc_notify.cc:214
#10 0x00007fa7d962161b in RGWSI_Notify::do_start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/services/svc_notify.cc:277
#11 0x00007fa7d8f17bcf in RGWServiceInstance::start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:331
#12 0x00007fa7d8f1a260 in RGWServices_Def::init (this=this@entry=0x558347de90a0, cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360
#13 0x00007fa7d8f1cc40 in RGWServices::do_init (this=this@entry=0x558347de90a0, _cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:284
#14 0x00007fa7d92a7b1f in RGWServices::init (dpp=<optimized out>, y=..., run_sync=<optimized out>, have_cache=<optimized out>, cct=<optimized out>, this=0x558347de90a0) at ../src/rgw/rgw_service.h:153
#15 RGWRados::init_svc (this=this@entry=0x558347de8dc0, raw=raw@entry=false, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1380
#16 0x00007fa7d930f241 in RGWRados::initialize (this=0x558347de8dc0, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1400
#17 0x00007fa7d944f85f in RGWRados::initialize (dpp=<optimized out>, _cct=0x558347c6a320, this=<optimized out>) at ../src/rgw/rgw_rados.h:586
#18 StoreManager::init_storage_provider (dpp=<optimized out>, dpp@entry=0x7ffc7f7a2e90, cct=cct@entry=0x558347c6a320, svc="rados", use_gc_thread=use_gc_thread@entry=true, use_lc_thread=use_lc_thread@entry=true, quota_threads=quota_threads@entry=true, run_sync_thread=true, run_reshard_thread=true, use_cache=true,
    use_gc=true) at ../src/rgw/rgw_sal.cc:55
#19 0x00007fa7d8e7367a in StoreManager::get_storage (use_gc=true, use_cache=true, run_reshard_thread=true, run_sync_thread=true, quota_threads=true, use_lc_thread=true, use_gc_thread=true, svc="rados", cct=0x558347c6a320, dpp=0x7ffc7f7a2e90) at /usr/include/c++/9/bits/basic_string.h:267
#20 radosgw_Main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/rgw_main.cc:372
#21 0x0000558347883f56 in main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/radosgw.cc:12
(gdb)
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffc7f7a2438) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffc7f7a2488, cond=0x7ffc7f7a2410) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x7ffc7f7a2410, mutex=0x7ffc7f7a2488) at pthread_cond_wait.c:647
#3  0x00007fa7d7097e42 in ceph::condition_variable_debug::wait (this=this@entry=0x7ffc7f7a2410, lock=...) at ../src/common/mutex_debug.h:148
#4  0x00007fa7d7953cba in ceph::condition_variable_debug::wait<librados::IoCtxImpl::operate(const object_t&, ObjectOperation*, ceph::real_time*, int)::<lambda()> > (pred=..., lock=..., this=0x7ffc7f7a2410) at ../src/librados/IoCtxImpl.cc:672
#5  librados::IoCtxImpl::operate (this=this@entry=0x558347c21010, oid=..., o=0x558347e12310, pmtime=<optimized out>, flags=<optimized out>) at ../src/librados/IoCtxImpl.cc:672
#6  0x00007fa7d792bd55 in librados::v14_2_0::IoCtx::operate (this=this@entry=0x558347e44760, oid="notify.0", o=o@entry=0x7ffc7f7a2690, flags=flags@entry=0) at ../src/librados/librados_cxx.cc:1536
#7  0x00007fa7d9490ad1 in rgw_rados_operate (dpp=<optimized out>, ioctx=..., oid="notify.0", op=op@entry=0x7ffc7f7a2690, y=..., flags=0) at ../src/rgw/rgw_tools.cc:277
#8  0x00007fa7d9627e0f in RGWSI_RADOS::Obj::operate (this=this@entry=0x558347e44710, dpp=<optimized out>, op=op@entry=0x7ffc7f7a2690, y=..., flags=flags@entry=0) at ../src/rgw/services/svc_rados.h:112
#9  0x00007fa7d96209a5 in RGWSI_Notify::init_watch (this=this@entry=0x558347c49530, dpp=<optimized out>, y=...) at ../src/rgw/services/svc_notify.cc:214
#10 0x00007fa7d962161b in RGWSI_Notify::do_start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/services/svc_notify.cc:277
#11 0x00007fa7d8f17bcf in RGWServiceInstance::start (this=0x558347c49530, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:331
#12 0x00007fa7d8f1a260 in RGWServices_Def::init (this=this@entry=0x558347de90a0, cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360
#13 0x00007fa7d8f1cc40 in RGWServices::do_init (this=this@entry=0x558347de90a0, _cct=<optimized out>, have_cache=<optimized out>, raw=raw@entry=false, run_sync=<optimized out>, y=..., dpp=<optimized out>) at ../src/rgw/rgw_service.cc:284
#14 0x00007fa7d92a7b1f in RGWServices::init (dpp=<optimized out>, y=..., run_sync=<optimized out>, have_cache=<optimized out>, cct=<optimized out>, this=0x558347de90a0) at ../src/rgw/rgw_service.h:153
#15 RGWRados::init_svc (this=this@entry=0x558347de8dc0, raw=raw@entry=false, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1380
#16 0x00007fa7d930f241 in RGWRados::initialize (this=0x558347de8dc0, dpp=<optimized out>) at ../src/rgw/rgw_rados.cc:1400
#17 0x00007fa7d944f85f in RGWRados::initialize (dpp=<optimized out>, _cct=0x558347c6a320, this=<optimized out>) at ../src/rgw/rgw_rados.h:586
#18 StoreManager::init_storage_provider (dpp=<optimized out>, dpp@entry=0x7ffc7f7a2e90, cct=cct@entry=0x558347c6a320, svc="rados", use_gc_thread=use_gc_thread@entry=true, use_lc_thread=use_lc_thread@entry=true, quota_threads=quota_threads@entry=true, run_sync_thread=true, run_reshard_thread=true, use_cache=true,
    use_gc=true) at ../src/rgw/rgw_sal.cc:55
#19 0x00007fa7d8e7367a in StoreManager::get_storage (use_gc=true, use_cache=true, run_reshard_thread=true, run_sync_thread=true, quota_threads=true, use_lc_thread=true, use_gc_thread=true, svc="rados", cct=0x558347c6a320, dpp=0x7ffc7f7a2e90) at /usr/include/c++/9/bits/basic_string.h:267
#20 radosgw_Main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/rgw_main.cc:372
#21 0x0000558347883f56 in main (argc=<optimized out>, argv=<optimized out>) at ../src/rgw/radosgw.cc:12

Any suggestion on what can be a problem and how to reset RGW so it will be able to start normally?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux