RGW blocking on large objects

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 14 Oct 2019 12:54:05 -0700

We set up a new Nautilus cluster and only have RGW on it. While we had
a job doing 200k IOPs of really small objects, I noticed that HAProxy
was kicking out RGW backends because they were taking more than 2
seconds to return. We GET a large ~4GB file each minute and use that
as a health check to determine if the system is taking too long to
service requests. It seems that other IO is being blocked by this
large transfer. This seems to be the case with both civetweb and
beast. But I'm double checking beast at the moment because I'm not
100% sure we were using it at the start.

Any ideas how to mitigate this? It seems that IOs are being scheduled
on a thread and if they get unlucky enough to be scheduled behind a
big IO, they are just stuck, in this case HAProxy could kick out the
backend before the IO is returned and it has to re-request it.

Thank you,
Robert LeBlanc

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx