co-located cephfs client deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

We have been benchmarking a hyperconverged cephfs cluster (kernel
clients + osd on same machines) for awhile. Over the weekend (for the
first time) we had one cephfs mount deadlock while some clients were
running ior.

All the ior processes are stuck in D state with this stack:

[<ffffffffafdb53a3>] wait_on_page_bit+0x83/0xa0
[<ffffffffafdb54d1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffafdb5564>] filemap_fdatawait_range+0x14/0x30
[<ffffffffafdb79e6>] filemap_write_and_wait_range+0x56/0x90
[<ffffffffc0f11575>] ceph_fsync+0x55/0x420 [ceph]
[<ffffffffafe76247>] do_fsync+0x67/0xb0
[<ffffffffafe76530>] SyS_fsync+0x10/0x20
[<ffffffffb0372d5b>] system_call_fastpath+0x22/0x27
[<ffffffffffffffff>] 0xffffffffffffffff

We tried restarting the co-located OSDs, and tried evicting the
client, but the processes stay deadlocked.

We've seen the recent issue related to co-location
(https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
have the `usercopy` warning in dmesg.

Are there other known issues related to co-locating?

Thanks!
Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux