Which kernel version are you using? We've had lots of problems with random deadlocks in kernels with cephfs but 4.19 seems to be pretty stable. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Apr 1, 2019 at 12:45 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > Hi all, > > We have been benchmarking a hyperconverged cephfs cluster (kernel > clients + osd on same machines) for awhile. Over the weekend (for the > first time) we had one cephfs mount deadlock while some clients were > running ior. > > All the ior processes are stuck in D state with this stack: > > [<ffffffffafdb53a3>] wait_on_page_bit+0x83/0xa0 > [<ffffffffafdb54d1>] __filemap_fdatawait_range+0x111/0x190 > [<ffffffffafdb5564>] filemap_fdatawait_range+0x14/0x30 > [<ffffffffafdb79e6>] filemap_write_and_wait_range+0x56/0x90 > [<ffffffffc0f11575>] ceph_fsync+0x55/0x420 [ceph] > [<ffffffffafe76247>] do_fsync+0x67/0xb0 > [<ffffffffafe76530>] SyS_fsync+0x10/0x20 > [<ffffffffb0372d5b>] system_call_fastpath+0x22/0x27 > [<ffffffffffffffff>] 0xffffffffffffffff > > We tried restarting the co-located OSDs, and tried evicting the > client, but the processes stay deadlocked. > > We've seen the recent issue related to co-location > (https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't > have the `usercopy` warning in dmesg. > > Are there other known issues related to co-locating? > > Thanks! > Dan > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com