Hi dongsheng https://tracker.ceph.com/issues/43178 , Can I open a PR to fix it like you said?Thanks. yangjun@xxxxxxxxxxxxxxxxxxxx From: Jason Dillaman Date: 2019-12-06 22:58 To: Dongsheng Yang CC: yangjun@xxxxxxxxxxxxxxxxxxxx; ceph-users Subject: Re: rbd_open_by_id crash when connection timeout On Fri, Dec 6, 2019 at 9:51 AM Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx> wrote: > > > 在 12/6/2019 9:46 PM, Jason Dillaman 写道: > > On Fri, Dec 6, 2019 at 12:12 AM Dongsheng Yang > > <dongsheng.yang@xxxxxxxxxxxx> wrote: > >> > >> > >> On 12/06/2019 12:50 PM, yangjun@xxxxxxxxxxxxxxxxxxxx wrote: > >> > >> Hi Jason, dongsheng > >> I found a problem using rbd_open_by_id when connection timeout(errno = 110, ceph version 12.2.8, there is no change about rbd_open_by_id in master branch). > >> int r = ictx->state->open(false); > >> if (r < 0) { // r = -110 > >> delete ictx; // crash,the stack is shown below: > >> } else { > >> *image = (rbd_image_t)ictx; > >> } > >> Stack is : > >> (gdb) bt > >> Program terminated with signal 11, Segmentation fault. > >> #0 Mutex::Lock (this=this@entry=0x8, no_lockdep=no_lockdep@entry=false) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/Mutex.cc:92 > >> #1 0x0000ffff765eb814 in Locker (m=..., this=<synthetic pointer>) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/Mutex.h:115 > >> #2 PerfCountersCollection::remove (this=0x0, l=0xffff85bf0588 <main_arena+88>) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/perf_counters.cc:62 > >> #3 0x0000ffff757f26c4 in librbd::ImageCtx::perf_stop (this=0x2065d2f0) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/ImageCtx.cc:426 > >> #4 0x0000ffff757f6e48 in librbd::ImageCtx::~ImageCtx (this=0x2065d2f0, __in_chrg=<optimized out>) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/ImageCtx.cc:239 > >> #5 0x0000ffff757e8e70 in rbd_open_by_id (p=p@entry=0x2065e780, > >> id=id@entry=0xffff85928bb4 "20376aa706c0", image=image@entry=0xffff75b9b0b8, > >> snap_name=snap_name@entry=0x0) at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/librbd.cc:2692 > >> #6 0x0000ffff75aed524 in __pyx_pf_3rbd_5Image___init__ ( > >> __pyx_v_read_only=0xffff85f02bb8 <_Py_ZeroStruct>, __pyx_v_snapshot=0xffff85f15938 <_Py_NoneStruct>, > >> __pyx_v_by_id=0xffff85f02ba0 <_Py_TrueStruct>, __pyx_v_name=0xffff85928b90, > >> __pyx_v_ioctx=0xffff7f0283d0, __pyx_v_self=0xffff75b9b0a8) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/build/src/pybind/rbd/pyrex/rbd.c:12662 > >> #7 __pyx_pw_3rbd_5Image_1__init__ (__pyx_v_self=0xffff75b9b0a8, __pyx_args=<optimized out>, > >> __pyx_kwds=<optimized out>) > >> at /var/lib/jenkins/workspace/ceph_L/ceph/build/src/pybind/rbd/pyrex/rbd.c:12378 > >> #8 0x0000ffff85e02924 in type_call () from /lib64/libpython2.7.so.1.0 > >> #9 0x0000ffff85dac254 in PyObject_Call () from /lib64/libpython2.7.so.1.0 > > I suspect you will hit tons of errors and corruption and leaking RADOS > > objects when you start injecting RADOS timeouts. It's not something > > librbd really supports or tests against -- so you need to expect the > > unexpected. Definitely feel free to open a PR to fix it, though -- but > > otherwise it would be a really *low* priority to fix. > > > >> Meanwhile, I have sone questions about rbd_open, if state->open failed(return value r < 0), Is there a risk of memory leaks about ictx? > >> extern "C" int rbd_open(rados_ioctx_t p, const char *name, rbd_image_t *image, > >> const char *snap_name) > >> { > >> librados::IoCtx io_ctx; > >> librados::IoCtx::from_rados_ioctx_t(p, io_ctx); > >> TracepointProvider::initialize<tracepoint_traits>(get_cct(io_ctx)); > >> librbd::ImageCtx *ictx = new librbd::ImageCtx(name, "", snap_name, io_ctx, > >> false); > >> tracepoint(librbd, open_image_enter, ictx, ictx->name.c_str(), ictx->id.c_str(), ictx->snap_name.c_str(), ictx->read_only); > >> > >> int r = ictx->state->open(false); > >> if (r >= 0) { // if r < 0, Is there a risk of memory leaks? > >> > >> > >> Good catch. I think there is. And same at rbd_open_read_only. Seems we need a patch to fix them. > > No, that's not a memory leak. The blocking version of > > "ImageState::open" will free the "ImageCtx" memory upon failure. > > > My bad, I did not look it carefully before repling this mail, > > I realized that we delete ctx in rbd_open_by_id but not in rbd_open, so > I thought it's a memory leak > > in rbd_open. But I was wrong as you mentioned, it is released in > state->open(). So we dont need to add > > a delete in rbd_open(), on the contrary, we need to remove the "delete > ictx" to avoid double-free problem as below: > > > diff --git a/src/librbd/librbd.cc b/src/librbd/librbd.cc > index 6c303c9..b518fad 100644 > --- a/src/librbd/librbd.cc > +++ b/src/librbd/librbd.cc > @@ -3924,9 +3924,7 @@ extern "C" int rbd_open_by_id(rados_ioctx_t p, > const char *id, > ictx->id.c_str(), ictx->snap_name.c_str(), ictx->read_only); > > int r = ictx->state->open(0); > - if (r < 0) { > - delete ictx; > - } else { > + if (r >= 0) { > *image = (rbd_image_t)ictx; > } > tracepoint(librbd, open_image_exit, r); > @@ -3999,9 +3997,7 @@ extern "C" int > rbd_open_by_id_read_only(rados_ioctx_t p, const char *id, > ictx->id.c_str(), ictx->snap_name.c_str(), ictx->read_only); > > int r = ictx->state->open(0); > - if (r < 0) { > - delete ictx; > - } else { > + if (r >= 0) { > *image = (rbd_image_t)ictx; > } > > tracepoint(librbd, open_image_exit, r); > > > There is a reproduce test.c attached. Opened https://tracker.ceph.com/issues/43178 > Thanx > > Yang > > > > >> Thanx > >> Yang > >> > >> *image = (rbd_image_t)ictx; > >> } > >> tracepoint(librbd, open_image_exit, r); > >> return r; > >> } > >> > >> Can you give me some advice? many thanks. > >> > >> > >> ________________________________ > >> yangjun@xxxxxxxxxxxxxxxxxxxx > >> > >> > > Thanks, -- Jason _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx