[ too much to quote ] Dense nodes often work better for object-focused workloads than block-focused, the impact of delayed operations is simply speed vs. a tenant VM crashing. Re RAID5 volumes to decrease the number of OSD’s: This sort of approach is getting increasing attention in that it brings down the OSD count, reducing the resource demands of peering, especially during storms. It also makes the OSD fillage bell curve narrower. But one must also consider that the write speed of a RAID5 group is that of a single drive due to the parity recalc, and that if one does not adjust osd_op_threads and osd_disk_threads, throughput can suffer because fewer ops can run across the cluster at the same time. Re Intel P3700 NVMe cards, has anyone out there experienced reset issues that may be related to workload, kernel version, driver version, firmware version, etc? Or even Firefly vs Hammer? There was an excellent presentation at the Austin OpenStack Summit re optimizing dense nodes — pinning OSD processes, HBA/NIC interrupts etc. to cores/sockets to limit data sent over QPI links on NUMA architectures. It’s easy to believe that modern inter-die links are Fast Enough For You Old Man but there’s more too it. — Anthony _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com