Anthony asked about the 'use case'. Well, I haven't gone into details because I worried it wouldn't help much. From a 'ceph' perspective, the sandbox layout goes like this: 4 pretty much identical old servers, each with 6 drives, and a smaller server just running a mon to break ties. Usual front-side lan, separate back-side networking setup. Each of the servers is running a few vms, all more or less identical for the test case. Each of the vms is supported by a rbd via user space libvirt (not kernel mapped). Each rbd belongs to a pool that is entirely local to the chassis, presently a replica on 3 of the osds. One of the littler vms runs a mon+mgr per chassis. Of course what's important is there's a pool that spans the chassis and does all the usual things for userland ceph is good at. But for these tests I just unplugged all that. So, do any process that involves a bunch of little writes -- like installing a package or updating a initramfs and be ready to sit for a long time. All the drives are 7200 rpm SATA spinners. CPU's are not overloaded (fewer vms than cores), no swapping, memory left over. All write-back caching, virtio drives. Ceph octopus latest, though it's no better than nautilus performance wise in this case. Ubuntu LTS/focal/20.04 I think. Checked all the networking stats, no dropped packets, no overflow buffers and anyhow there shouldn't be any important traffic on the front side and only ceph owns the back end. No ceph problems reported, all pgs active, nothing misplaced, no erasure coded pools. So, there's a tiny novel, thanks for sticking with it! On 6/29/20 11:12 PM, Anthony D'Atri wrote: >> Thanks for the thinking. By 'traffic' I mean: when a user space rbd >> write has as a destination three replica osds in the same chassis > eek. > >> does the whole write get shipped out to the mon and then back > Mons are control-plane only. > >> All the 'usual suspects' like lossy ethernets and miswirings, etc. have >> been checked. It's actually painful to sit and wait while >> 'update-initramfs' can take over a minute when the vm is chassis-local >> to the osds getting the write info. > You have shared almost none of your hardware or use-case. We know that you’re doing convergence, with unspecified CPU, memory, drives. We also don’t know how heavy your colocated compute workload is. Since you mention update-initramfs, I’ll guess that your workload is VMs with RBD volumes attached to libvirt/QEMU? With unspecified RBD cache configuration. We also know nothing of your network setup and saturation. > > I have to suspect that either you’re doing something fundamentally wrong, or should just set up a RAID6 volume and carve out LVMs. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx