> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Tue, Jul 28, 2015 at 2:46 PM, van <chaofanyu@xxxxxxxxxxx> wrote: >> Hi, Ilya, >> >> In the dmesg, there is also a lot of libceph socket error, which I think >> may be caused by my stopping ceph service without unmap rbd. > > Well, sure enough, if you kill all OSDs, the filesystem mounted on top > of rbd device will get stuck. Sure it will get stuck if osds are stopped. And since rados requests have retry policy, the stucked requests will recover after I start the daemon again. But in my case, the osds are running in normal state and librbd API can read/write normally. Meanwhile, heavy fio test for the filesystem mounted on top of rbd device will get stuck. I wonder if this phenomenon is triggered by running rbd kernel client on machines have ceph daemons, i.e. the annoying loopback mount deadlock issue. In my opinion, if it’s due to the loopback mount deadlock, the OSDs will become unresponsive. No matter the requests are from user space requests (like API) or from kernel client. Am I right? If so, my case seems to be triggered by another bug. Anyway, it seems that I should separate client and daemons at least. Thanks. > > Thanks, > > Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com