Folks, we’re building a Ceph cluster based on HDDs with SSDs for WAL/DB files. We have four nodes with 8TB disks and two SSDs and four nodes with many small HDDs (1.4-2.7TB) and four SSDs for the journals. HDDs are configured as RAID 0 on the controllers with writethrough enabled. I am writing this e-mail as we see absolutely catastrophic performance on the cluster (I am talking about anything between literally no throughput for seconds to 200mb/s, wildly varying). We’ve checked very single layer: network is far from being saturated (we have 25gbit/s uplinks and we can confirm that they deliver 25gbit/s). Using iostat, we can prove that during a „rados bench“ call neither the SSDs nor the actual hard disks are anywhere near 100% disk utilization. Usually usage does not exceed 55%. Servers are Dell RX740d with PERC onboard controllers. We also have four SSD-only nodes. When benchmarking against a pool on these, I reliably get 400-500 mb/s when doing the same 64k size test that I ran against the HDD pool. We’ve tried a number of things such as enabling the BBWC on the RAID controllers to no major success. „ceph tell osd.X bench“ will show 55-100 iops for HDDs even though their journals are on SSDs. We have also tried disabling the SSD’s write cache (the ones with the journals on them) to no success. Any pointer to what we may haver overseen would be greatly appreciated. Best regards Martin _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx