Re: which kernel version can help avoid kernel client deadlock

Ilya Dryomov <idryomov@xxxxxxxxx> · Tue, 28 Jul 2015 19:40:55 +0300

On Tue, Jul 28, 2015 at 7:20 PM, van <chaofanyu@xxxxxxxxxxx> wrote:
>
>> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>
>> On Tue, Jul 28, 2015 at 2:46 PM, van <chaofanyu@xxxxxxxxxxx> wrote:
>>> Hi, Ilya,
>>>
>>>  In the dmesg, there is also a lot of libceph socket error, which I think
>>> may be caused by my stopping ceph service without unmap rbd.
>>
>> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
>> of rbd device will get stuck.
>
> Sure it will get stuck if osds are stopped. And since rados requests have retry policy, the stucked requests will recover after I start the daemon again.
>
> But in my case, the osds are running in normal state and librbd API can read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device will get stuck.
>
> I wonder if this phenomenon is triggered by running rbd kernel client on machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
>
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel client.
> Am I right?

Not necessarily.

>
> If so, my case seems to be triggered by another bug.
>
> Anyway, it seems that I should separate client and daemons at least.

Try 3.18.19 if you can.  I'd be interested in your results.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com