Hello Jason, Am 18.07.19 um 20:10 schrieb Jason Dillaman: > On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote: >> Hello cephers, >> >> rbd-nbd crashes in a reproducible way here. > I don't see a crash report in the log below. Is it really crashing or > is it shutting down? If it is crashing and it's reproducable, can you > install the debuginfo packages, attach gdb, and get a full backtrace > of the crash? I do not get a crash report of rbd-nbd. I seems that "rbd-nbd" just terminates, and crashes the xfs filesystem because the nbd device is not available anymore. ("rbd nbd ls" shows no mapped device anymore) > > It seems like your cluster cannot keep up w/ the load and the nbd > kernel driver is timing out the IO and shutting down. There is a > "--timeout" option on "rbd-nbd" that you can use to increase the > kernel IO timeout for nbd. > I have also a 36TB XFS (non_ec) volume on this virtual system mapped by krbd which is under really heavy read/write usage. I never experienced problems like this on this system with similar usage patterns. The volume which is involved in the problem only handles a really low load and i was capable to create the error situation by using the simple "find . -type f -name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;" command. I copied and read ~1.5 TB of data to this volume without a problem - it seems that the gzip command provokes a io pattern which leads to the error situation. As described i use a luminous "12.2.11" client which does not support that "--timeout" option (btw. a backport would be nice). Our ceph system runs with a heavy write load, therefore we already set a 60 seconds timeout using the following code: (https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c) We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 4.15.0-45, ubuntu 16.04) - we never experienced problems like this. We only experience problems like this with rbd-nbd > 12.2.5 on ubuntu 16.04 (kernel 4.15) or ubuntu 18.04 (kernel 4.15) with erasure encoding or without. Regards Marc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com