Hello Jason, Am 20.05.19 um 23:49 schrieb Jason
Dillaman:
Sure, currently we do not have lots of vms which are capable to run fstim on rbd volumes.On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote:Hello cephers, we have a few systems which utilize a rbd-bd map/mount to get access to a rbd volume. (This problem seems to be related to " Slow requests from bluestore osds" (the original thread)) Unfortunately the rbd-nbd device of a system crashes three mondays in series at ~00:00 when the systemd fstrim timer executes "fstrim -av". (which runs in parallel to deep scrub operations)That's probably not a good practice if you have lots of VMs doing this at the same time *and* you are not using object-map. The reason is that "fstrim" could discard huge extents that result around a thousand concurrent remove/truncate/zero ops per image being thrown at your cluster. But the already involved RBD Images are multiple-tb images with a high write/deletetion rate. Therefore i am already in progress to distribute fstrims by adding random delays After that the device constantly reports io errors every time a access to the filesystem happens. Unmounting, remapping and mounting helped to get the filesystem/device back into business :-)If the cluster was being DDoSed by the fstrims, the VM OSes' might have timed out thinking a controller failure. Yes and no :-) Probably my problem is related to the kernel release, kernel setting or the operating system release. Why?
From my point of view, the error behavior is currently
reproducible with a good probability. Regards |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com