On Tue, Jul 28, 2015 at 7:20 PM, van <chaofanyu@xxxxxxxxxxx> wrote: > >> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> >> On Tue, Jul 28, 2015 at 2:46 PM, van <chaofanyu@xxxxxxxxxxx> wrote: >>> Hi, Ilya, >>> >>> In the dmesg, there is also a lot of libceph socket error, which I think >>> may be caused by my stopping ceph service without unmap rbd. >> >> Well, sure enough, if you kill all OSDs, the filesystem mounted on top >> of rbd device will get stuck. > > Sure it will get stuck if osds are stopped. And since rados requests have retry policy, the stucked requests will recover after I start the daemon again. > > But in my case, the osds are running in normal state and librbd API can read/write normally. > Meanwhile, heavy fio test for the filesystem mounted on top of rbd device will get stuck. > > I wonder if this phenomenon is triggered by running rbd kernel client on machines have ceph daemons, i.e. the annoying loopback mount deadlock issue. > > In my opinion, if it’s due to the loopback mount deadlock, the OSDs will become unresponsive. > No matter the requests are from user space requests (like API) or from kernel client. > Am I right? Not necessarily. > > If so, my case seems to be triggered by another bug. > > Anyway, it seems that I should separate client and daemons at least. Try 3.18.19 if you can. I'd be interested in your results. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com