Hi, I just upgraded from Infernalis to Jewel and see an approximate 10x latency increase. Quick facts: - 3x replicated pool - 4x 2x-"E5-2690 v3 @ 2.60GHz", 128GB RAM, 6x 1.6 TB Intel S3610 SSDs, - LSI3008 controller with up-to-date firmware and upstream driver, and up-to-date firmware on SSDs. - 40GbE (Mellanox, with up-to-date drivers & firmware) - CentOS 7.2 Physical checks out, both iperf3 for network and e.g. fio over all the SSDs. Not done much of Linux tuning yet; but irqbalanced does a pretty good job with pairing both NIC and HBA with their respective CPUs. In performance hunting mode, and today took the next logical step of upgrading from Infernalis to Jewel. Tester is remote KVM/Qemu/libvirt guest (openstack) CentOS 7 image with fio. The test scenario is 4K randomwrite, libaio, directIO, QD=1, runtime=900s, test-file-size=40GiB. Went from a picture of [1] to [2]. In [1], the guest saw 98.25% of the I/O complete within maximum 250 µsec (~4000 IOPS). This, [2], sees 98.95% of the IO at ~4 msec (actually ~300 IOPs). Between [1] and [2] (simple plots of FIO's E2E-latency metrics), the entire cluster including compute nodes code went from Infernalis to 10.2.2 What's going on here? I haven't tuned Ceph OSDs either in config or via Linux kernel at all yet; upgrade to Jewel came first. I haven't changed any OSD configs between [1] and [2] myself (only minimally before [1], 0 effort on performance tuning) , other than updated to Jewel tunables. But the difference is very drastic, wouldn't you say? Best, Martin [1] http://martin.millnert.se/ceph/pngs/guest-ceph-fio-bench/test08/ceph-fio-bench_lat.1.png [2] http://martin.millnert.se/ceph/pngs/guest-ceph-fio-bench/test10/ceph-fio-bench_lat.1.png _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com