On Wed, Apr 25, 2018 at 10:58:43PM +0000, Blair Bethwaite wrote: :Hi Jon, : :On 25 April 2018 at 21:20, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote: :> :> here's a snap of 24hr graph form one server (others are similar in :> general shape): :> :> https://snapshot.raintank.io/dashboard/snapshot/gB3FDPl7uRGWmL17NHNBCuWKGsXdiqlt : :That's what, a median IOPs of about 80? Pretty high for spinning disk. :I'd guess you're seeing write-choking. You might be able to improve :things a bit by upping your librbd cache size (though obviously that :would only have an effect on new or reset instances), also perhaps :double check your block queue scheduler max_sectors_kb inside a guest :and make sure you're not splitting up all writes into 512 byte chunks. :But does kinda look like you need more hardware, and fast. Those block queue scheduler tips *might* help me squeeze a bit more till next budget starts July 1... Seeing yesterday I have 75% more VMs running than I thought does change my perspective a bit make the "no we're really just crushed" analysis more plausible! Thanks, -Jon _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com