On Sat, Sep 8, 2018 at 1:52 AM Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote: > > I have a fairly large cluster running ceph bluestore with extremely fast SAS ssd for the metadata. Doing FIO benchmarks I am getting 200k-300k random write iops but during sustained workloads of ElasticSearch my clients seem to hit a wall of around 1100 IO/s per RBD device. I've tried 1 RBD and 4 RBD devices and I still only get 1100 IO per device, so 4 devices gets me around 4k. > > Is there some sort of setting that limits each RBD devices performance? I've tried playing with nr_requests but that don't seem to change it at all... I'm just looking for another 20-30% performance on random write io... I even thought about doing raid 0 across 4-8 rbd devices just to get the io performance. What is the I/O profile of that workload? How did you arrive at the 20-30% number? Which kernel are you running? Increasing nr_requests doesn't actually increase the queue depth, at least on anything moderately recent. You need to map with queue_depth=X for that, see [1] for details. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b55841807fb864eccca0167650a65722fd7cd553 Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com