> On Mar 18, 2025, at 2:13 PM, Giovanna Ratini <giovanna.ratini@xxxxxxxxxxxxxxx> wrote: > > Hello Antony, > > no, no QoS applied to Vms. > > The Server has PCIe Gen 4 > > ceph osd dump | grep pool > pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 13.04 > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0 application cephfs read_balance_score 2.02 > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 50 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 2.42 > pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 16386 lfor 0/644/2603 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.52 > > I think, this is the default config. 🙈 Yes, with the autoscaler on. I suggest raising mon_target_pg_per_osd to 250. How many OSDs do you have? > > I will search for my chassies supermicro upgrade. > > Thank you > > > Am 18.03.2025 um 17:57 schrieb Anthony D'Atri: >>> Then I tested on the *Proxmox host*, and the results were significantly better. >> My Proxmox prowess is limited, but from my experience with other virtualization platforms, I have to ask if there is any QoS throttling applied to VMs. With OpenStack or DO there is often IOPS and/or throughput throttling via libvirt to mitigate noisy neighbors. >> >>> fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G --runtime=60 --group_reporting >>> >>> *IOPS*: *1.54M* >>> >>> # *Bandwidth*: *6032MiB/s (6325MB/s)* >>> # *Latency*: >>> >>> * *Avg*: *39.8µs* >>> * *99.9th percentile*: *71µs* >>> >>> # *CPU Usage*: *usr=22.60%, sys=77.13%* >>> # >>> >>> Am 18.03.2025 um 15:27 schrieb Anthony D'Atri: >>>> Which NVMe drive SKUs specifically? >>> # */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02QT5A8 >>> # */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN: 42R0A036T5A8 >>> # */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02UT5A8 >> Kioxia CD6. If you were using client-class drives all manner of performance issues would be expected. >> >> Is your server chassis at least PCIe Gen 4? If it’s Gen 3 that may hamper these drives. >> >> Also, how many of these are in your cluster? If it’s a small number you might still benefit from chopping each into at least 2 separate OSDs. >> >> And please send `ceph osd dump | grep pool`, having too few PGs wouldn’t do you any favors. >> >> >>>> Are you running a recent kernel? >>> penultimate: 6.8.12-8-pve (VM, yes) >> Groovy. If you were running like a CentOS 6 or CentOS 7 kernel then NVMe issues might be expected as old kernels had rudimentary NVMe support. >> >>>> Have you updated firmware on the NVMe devices? >>> No. >> Kioxia appears to not release firmware updates publicly but your chassis brand (Dell, HP, SMCI, etc) might have an update. >> e.g.https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55 >> >> If there is an available update I would strongly suggest applying. > >> >>> Thanks again, >>> >>> best regards, >>> Gio >>> >>> _______________________________________________ >>> ceph-users mailing list --ceph-users@xxxxxxx >>> To unsubscribe send an email toceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx