Hi folks, The company I recently joined has a Proxmox cluster of 4 hosts with a CEPH implementation that was set-up using the Proxmox GUI. It is running terribly, and as a CEPH newbie I'm trying to figure out if the configuration is at fault. I'd really appreciate some help and guidance on this please. The symptoms: * Really slow read/write performance * Really Really slow rebalancing/backfill * High Apply/Commit latency on a couple of the SSDs when under load * Knock on performance hit on key VM's (particularly AD/DNS services) that affect user experience The setup is as follows: 4 x hosts, 3 hosts are Dell R820s which have 4 socket Xeon's with 96 cores and 1.5 TB RAM. The other (host 4) has a Ryzen 7 5800 processor with 64GB RAM. All servers are running on a simple 10Gbe network with dedicated NICs on a separate subnet. The SSD's in use are a combination of new Seagate IronWolf 125 1TB SSDs and older Crucial MX500 1TB, and WDC Blue 1TB drives. I know some of these are consumer class, but I'm working on replacing these. I believe the OSDs were added to ProxMox's CEPH implementation with the default settings, i.e DB and WAL on the same OSD. All 4 hosts are set as Monitors, and the 3 beefy ones set as Managers and metadata servers. Ceph version is 16.2.7 Here is the config: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 192.168.8.4/24 fsid = 4a4b4fff-d140-4e11-a35b-cbac0e18a3ce mon_allow_pool_delete = true mon_host = 192.168.8.4 192.168.8.6 192.168.8.5 192.168.8.3 ms_bind_ipv4 = true ms_bind_ipv6 = false osd_memory_target = 2147483648 osd_pool_default_min_size = 2 osd_pool_default_size = 3 public_network = 192.168.8.4/24 [client] keyring = /etc/pve/priv/$cluster.$name.keyring [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring [mds.cl1-h1-lv] host = cl1-h1-lv mds_standby_for_name = pve [mds.cl1-h2-lv] host = cl1-h2-lv mds_standby_for_name = pve [mds.cl1-h3-lv] host = cl1-h3-lv mds_standby_for_name = pve [mon.cl1-h1-lv] public_addr = 192.168.8.3 [mon.cl1-h2-lv] public_addr = 192.168.8.4 [mon.cl1-h3-lv] public_addr = 192.168.8.5 [mon.cl1-h4-lv] public_addr = 192.168.8.6 And the Crush map: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class ssd device 1 osd.1 class ssd device 2 osd.2 class ssd device 4 osd.4 class ssd device 5 osd.5 class ssd device 6 osd.6 class ssd device 7 osd.7 class ssd device 8 osd.8 class ssd device 9 osd.9 class ssd device 10 osd.10 class ssd device 11 osd.11 class ssd device 12 osd.12 class ssd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root # buckets host cl1-h2-lv { id -3 # do not change unnecessarily id -4 class ssd # do not change unnecessarily # weight 2.729 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.910 item osd.5 weight 0.910 item osd.10 weight 0.910 } host cl1-h3-lv { id -5 # do not change unnecessarily id -6 class ssd # do not change unnecessarily # weight 2.729 alg straw2 hash 0 # rjenkins1 item osd.1 weight 0.910 item osd.6 weight 0.910 item osd.11 weight 0.910 } host cl1-h4-lv { id -7 # do not change unnecessarily id -8 class ssd # do not change unnecessarily # weight 1.819 alg straw2 hash 0 # rjenkins1 item osd.7 weight 0.910 item osd.2 weight 0.910 } host cl1-h1-lv { id -9 # do not change unnecessarily id -10 class ssd # do not change unnecessarily # weight 3.639 alg straw2 hash 0 # rjenkins1 item osd.4 weight 0.910 item osd.8 weight 0.910 item osd.9 weight 0.910 item osd.12 weight 0.910 } root default { id -1 # do not change unnecessarily id -2 class ssd # do not change unnecessarily # weight 10.916 alg straw2 hash 0 # rjenkins1 item cl1-h2-lv weight 2.729 item cl1-h3-lv weight 2.729 item cl1-h4-lv weight 1.819 item cl1-h1-lv weight 3.639 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map Based on some reading, I'm starting to understand a little about what can be tweaked. For example, I think the osd_memory_target looks low. I also think the DB/WAL should be on dedicated disks or partitions, but have no idea what procedure to follow to do this. I'm actually thinking that the best bet would be to copy the VM's to temporary storage (as there is only about 7TBs worth) and then set-up CEPH from scratch following some kind of best practice guide. Anyway, any help would be gratefully received. Thanks for reading. Kind regards Tino Todino _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx