We set up a new Nautilus cluster and only have RGW on it. While we had a job doing 200k IOPs of really small objects, I noticed that HAProxy was kicking out RGW backends because they were taking more than 2 seconds to return. We GET a large ~4GB file each minute and use that as a health check to determine if the system is taking too long to service requests. It seems that other IO is being blocked by this large transfer. This seems to be the case with both civetweb and beast. But I'm double checking beast at the moment because I'm not 100% sure we were using it at the start. Any ideas how to mitigate this? It seems that IOs are being scheduled on a thread and if they get unlucky enough to be scheduled behind a big IO, they are just stuck, in this case HAProxy could kick out the backend before the IO is returned and it has to re-request it. Thank you, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx