>>To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files This is very strange, you shouldn't have corruption. Do you use writeback ? if yes, do you have disable barrier on your filesystem ? (What is the qemu version ? guest os ? guest os kernel ?) ----- Mail original ----- De: "Krzysztof Nowicki" <krzysztof.a.nowicki@xxxxxxxxx> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Envoyé: Vendredi 6 Février 2015 10:16:30 Objet: OSD slow requests causing disk aborts in KVM Hi all, I'm running a small Ceph cluster with 4 OSD nodes, which serves as a storage backend for a set of KVM virtual machines. The VMs use RBD for disk storage. On the VM side I'm using virtio-scsi instead of virtio-blk in order to gain DISCARD support. Each OSD node is running on a separate machine, using 3TB WD Black drive + Samsung SSD for journal. The machines used for OSD nodes are not equal in spec. Three of them are small servers, while one is a desktop PC. The last node is the one causing trouble. During high loads caused by remapping due to one of the other nodes going down I've experienced some slow requests. To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files. What I wonder if such behaviour (aborts) is normal in case slow requests pile up. I always though that these requests would be delayed but eventually they'd be handled. Are there any tunables that would help me avoid such situations? I would really like to avoid VM outages caused by such corruption issues. I can attach some logs if needed. Best regards Chris _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com