Re: co-located cephfs client deadlock

Paul Emmerich <paul.emmerich@xxxxxxxx> · Mon, 1 Apr 2019 13:05:58 +0200

Which kernel version are you using? We've had lots of problems with
random deadlocks in kernels with cephfs but 4.19 seems to be pretty
stable.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Apr 1, 2019 at 12:45 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Hi all,
>
> We have been benchmarking a hyperconverged cephfs cluster (kernel
> clients + osd on same machines) for awhile. Over the weekend (for the
> first time) we had one cephfs mount deadlock while some clients were
> running ior.
>
> All the ior processes are stuck in D state with this stack:
>
> [<ffffffffafdb53a3>] wait_on_page_bit+0x83/0xa0
> [<ffffffffafdb54d1>] __filemap_fdatawait_range+0x111/0x190
> [<ffffffffafdb5564>] filemap_fdatawait_range+0x14/0x30
> [<ffffffffafdb79e6>] filemap_write_and_wait_range+0x56/0x90
> [<ffffffffc0f11575>] ceph_fsync+0x55/0x420 [ceph]
> [<ffffffffafe76247>] do_fsync+0x67/0xb0
> [<ffffffffafe76530>] SyS_fsync+0x10/0x20
> [<ffffffffb0372d5b>] system_call_fastpath+0x22/0x27
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> We tried restarting the co-located OSDs, and tried evicting the
> client, but the processes stay deadlocked.
>
> We've seen the recent issue related to co-location
> (https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
> have the `usercopy` warning in dmesg.
>
> Are there other known issues related to co-locating?
>
> Thanks!
> Dan
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com