Hello Alvaro, They are /not/ behind a traditional hardware RAID controller.No RAID controller is present — I'm working with native NVMe SSDs on individual PCIe lanes.
Cheers, Gio Am 21.03.2025 um 09:27 schrieb Alvaro Soto:
Digging into a different direction. I have question, are the drives connected a raid array card? And how are they presented? I don't recall where did I read something about a raid card presenting drives to the kernel as SCSI instead of NVMe and the queue depth being the issue. Cheers. -- Alvaro Soto Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people On Thu, Mar 20, 2025, 2:13 PM Giovanna Ratini < giovanna.ratini@xxxxxxxxxxxxxxx> wrote:Hello, Yes, I will test KRBD. I will be on holiday next week, so I don’t want to make any changes before then. Could you wait until 29.3? This is a production environment, and restoring a backup would take time. Or do you think there’s no risk in making the change without concern? Thank you, best Regards, Gio Am 20.03.2025 um 16:57 schrieb Eneko Lacunza:Hi Chris, I tried KRBD, even with a newly created disk and after shuting down and starting VM again, but no measurable difference. Our Ceph is 18.2.4, that may be a factor to consider, but 9k -> 273k?! Maybe Giovanna can test KRBD option and report back... :) Cheers El 20/3/25 a las 16:19, Chris Palmer escribió:HI Eneko No containers. In the Promox console go to Datacenter\Storage, click on the storage you are using, then Edit. There is a tick box KRBD. With that set, any virtual disks created in that storage will use KRBD rather than librbd. So it applies to all VMs that use that storage. Chris On 20/03/2025 15:00, Eneko Lacunza wrote:Chris, you tested from a container? Or how do you configure a KRBD disk for a VM? El 20/3/25 a las 15:15, Chris Palmer escribió:I just ran that command on one of my VMs. Salient details: * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe DB/WAL, single 10g NICs * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single 10g NICs , single 1g NICs for corosync * Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring, cache=writeback The results are very different: # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 ... fio-3.37 Starting 4 processes Jobs: 4 (f=4): [r(4)][-.-%][r=1080MiB/s][r=277k IOPS][eta 00m:00s] registry-read: (groupid=0, jobs=4): err= 0: pid=13355: Thu Mar 20 13:57:05 2025 read: IOPS=273k, BW=1068MiB/s (1120MB/s)(4096MiB/3835msec) slat (usec): min=7, max=3802, avg=13.77, stdev= 6.41 clat (nsec): min=599, max=4395.1k, avg=215298.68, stdev=38131.71 lat (usec): min=11, max=4408, avg=229.07, stdev=40.01 clat percentiles (usec): | 1.00th=[ 194], 5.00th=[ 200], 10.00th=[ 202], 20.00th=[ 204], | 30.00th=[ 206], 40.00th=[ 208], 50.00th=[ 210], 60.00th=[ 212], | 70.00th=[ 215], 80.00th=[ 217], 90.00th=[ 227], 95.00th=[ 243], | 99.00th=[ 367], 99.50th=[ 420], 99.90th=[ 594], 99.95th=[ 668], | 99.99th=[ 963] bw ( MiB/s): min= 920, max= 1118, per=100.00%, avg=1068.04, stdev=16.81, samples=28 iops : min=235566, max=286286, avg=273417.14, stdev=4303.79, samples=28 lat (nsec) : 750=0.01%, 1000=0.01% lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=96.06%, 500=3.67% lat (usec) : 750=0.24%, 1000=0.02% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01% cpu : usr=4.68%, sys=29.99%, ctx=1048987, majf=0, minf=102 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=1068MiB/s (1120MB/s), 1068MiB/s-1068MiB/s (1120MB/s-1120MB/s), io=4096MiB (4295MB), run=3835-3835msec Disk stats (read/write): sdc: ios=999346/0, sectors=7994768/0, merge=0/0, ticks=10360/0, in_queue=10361, util=95.49% On 20/03/2025 12:23, Eneko Lacunza wrote:Hi Giovanna, I just tested one of my VMs: # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=1G --runtime=60 --group_reporting registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 ... fio-3.33 Starting 4 processes registry-read: Laying out IO file (1 file / 1024MiB) registry-read: Laying out IO file (1 file / 1024MiB) registry-read: Laying out IO file (1 file / 1024MiB) registry-read: Laying out IO file (1 file / 1024MiB) Jobs: 4 (f=0): [f(4)][100.0%][r=33.5MiB/s][r=8578 IOPS][eta 00m:00s] registry-read: (groupid=0, jobs=4): err= 0: pid=24261: Thu Mar 20 12:57:26 2025 read: IOPS=8538, BW=33.4MiB/s (35.0MB/s)(2001MiB/60001msec) slat (usec): min=309, max=4928, avg=464.54, stdev=73.15 clat (nsec): min=602, max=1532.4k, avg=1999.15, stdev=3724.16 lat (usec): min=310, max=4931, avg=466.54, stdev=73.36 clat percentiles (nsec): | 1.00th=[ 812], 5.00th=[ 884], 10.00th=[ 940], 20.00th=[ 1096], | 30.00th=[ 1368], 40.00th=[ 1576], 50.00th=[ 1720], 60.00th=[ 1832], | 70.00th=[ 1944], 80.00th=[ 2096], 90.00th=[ 2480], 95.00th=[ 3024], | 99.00th=[12480], 99.50th=[15808], 99.90th=[47360], 99.95th=[61696], | 99.99th=[90624] bw ( KiB/s): min=30448, max=35868, per=100.00%, avg=34155.76, stdev=269.75, samples=476 iops : min= 7612, max= 8966, avg=8538.87, stdev=67.43, samples=476 lat (nsec) : 750=0.06%, 1000=14.94% lat (usec) : 2=59.18%, 4=23.07%, 10=1.28%, 20=1.17%, 50=0.21% lat (usec) : 100=0.08%, 250=0.01%, 500=0.01% lat (msec) : 2=0.01% cpu : usr=1.04%, sys=5.50%, ctx=537639, majf=0, minf=36 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=512316,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=33.4MiB/s (35.0MB/s), 33.4MiB/s-33.4MiB/s (35.0MB/s-35.0MB/s), io=2001MiB (2098MB), run=60001-60001msec Results are worse than yours, but this is on a production (not very busy) pool with 4x3.84TB SATA disks (4 disks total vs ~15 disks in your case) and 10G network. VM cpu is x86_64_v3 and host CPU Ryzen 1700. I gest almost the same IOPS with --iodepth=16 . I tried moving the VM to a Ryzen 5900X and results are somewhat better: # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 ... fio-3.33 Starting 4 processes Jobs: 4 (f=4): [r(4)][100.0%][r=45.4MiB/s][r=11.6k IOPS][eta 00m:00s] registry-read: (groupid=0, jobs=4): err= 0: pid=24282: Thu Mar 20 13:18:23 2025 read: IOPS=11.6k, BW=45.5MiB/s (47.7MB/s)(2730MiB/60001msec) slat (usec): min=110, max=21206, avg=341.21, stdev=79.69 clat (nsec): min=1390, max=42395k, avg=5147009.08, stdev=475506.40 lat (usec): min=335, max=42779, avg=5488.22, stdev=498.03 clat percentiles (usec): | 1.00th=[ 4621], 5.00th=[ 4752], 10.00th=[ 4817], 20.00th=[ 4948], | 30.00th=[ 5014], 40.00th=[ 5080], 50.00th=[ 5080], 60.00th=[ 5145], | 70.00th=[ 5211], 80.00th=[ 5276], 90.00th=[ 5407], 95.00th=[ 5538], | 99.00th=[ 6194], 99.50th=[ 6783], 99.90th=[ 9765], 99.95th=[12125], | 99.99th=[24249] bw ( KiB/s): min=36434, max=48352, per=100.00%, avg=46612.18, stdev=300.09, samples=476 iops : min= 9108, max=12088, avg=11653.04, stdev=75.03, samples=476 lat (usec) : 2=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=99.90%, 20=0.08%, 50=0.01% cpu : usr=0.98%, sys=4.18%, ctx=706399, majf=0, minf=99 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=698956,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=45.5MiB/s (47.7MB/s), 45.5MiB/s-45.5MiB/s (47.7MB/s-47.7MB/s), io=2730MiB (2863MB), run=60001-60001msec I think we're limited by the IO thread. I suggest you try multiple disks with SCSI Virtio single. My VM conf: agent: 1 boot: order=scsi0;ide2;net0 cores: 2 cpu: x86-64-v3 ide2: none,media=cdrom memory: 2048 meta: creation-qemu=9.0.2,ctime=1739888364 name: elacunza-btrfs-test net0: virtio=BC:24:11:47:9B:58,bridge=vmbr0,firewall=1 numa: 0 ostype: l26 scsi0: proxmox_r3_ssd2:vm-112-disk-0,discard=on,iothread=1,size=15G scsihw: virtio-scsi-single smbios1: uuid=263ab229-4379-4abf-b6bf-615b98ccd3d4 sockets: 1 vmgenid: 13b7f2a4-2a42-4600-845a-da88f96ae6e8 I think this is a KVM/QEMU issue, not a Ceph issue :) Maybe you can get better suggestions in pve-user mailing list. Cheers El 20/3/25 a las 12:29, Giovanna Ratini escribió:Hello Eneko, this is my configuration. The performance is similar across all VMs. I am now checking GitLab, as that is where people are complaining the most. agent: 1 balloon: 65000 bios: ovmf boot: order=scsi0;net0 cores: 10 cpu: host efidisk0: cephvm:vm-6506-disk-0,efitype=4m,size=528K memory: 130000 meta: creation-qemu=9.0.2,ctime=1734995123 name: gitlab02 net0: virtio=BC:24:11:6E:28:71,bridge=vmbr1,firewall=1 numa: 0 ostype: l26 scsi0:cephvm:vm-6506-disk-1,aio=native,cache=writeback,iothread=1,size=64G,ssd=1scsi1:cephvm:vm-6506-disk-2,aio=native,cache=writeback,iothread=1,size=10T,ssd=1scsihw: virtio-scsi-single smbios1: uuid=0a5294c0-c82a-40f2-aae4-f5880022a2ac sockets: 2 vmgenid: ea610fde-6c71-4b7f-9257-fa431a428e16 Cheers, Gio Am 20.03.2025 um 10:23 schrieb Eneko Lacunza:Hi Giovanna, Can you post VM's full config? Also, can you test with IO thread enabled and SCSI virtio single, and multiple disks? Cheers El 19/3/25 a las 17:27, Giovanna Ratini escribió:hello Eneko, Yes I did. No significant changes. :-( Cheers, Gio Am Mittwoch, März 19, 2025 13:09 CET, schrieb Eneko Lacunza <elacunza@xxxxxxxxx>:Hi Giovanna, Have you tried increasing iothreads option for the VM? Cheers El 18/3/25 a las 19:13, Giovanna Ratini escribió:Hello Antony, no, no QoS applied to Vms. The Server has PCIe Gen 4 ceph osd dump | grep pool pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0object_hashrjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21flagshashpspool stripe_width 0 pg_num_max 32 pg_num_min 1application mgrread_balance_score 13.04 pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0 application cephfs read_balance_score 2.02 pool 3 'cephfs_metadata' replicated size 3 min_size 2crush_rule 0object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 50 flags hashpspool stripe_width 0pg_autoscale_bias 4pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 2.42 pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0object_hashrjenkins pg_num 128 pgp_num 128 autoscale_mode onlast_change 16386lfor 0/644/2603 flags hashpspool,selfmanaged_snapsstripe_width 0application rbd read_balance_score 1.52 I think, this is the default config. 🙈 I will search for my chassies supermicro upgrade. Thank you Am 18.03.2025 um 17:57 schrieb Anthony D'Atri:Then I tested on the *Proxmox host*, and the results were significantly better.My Proxmox prowess is limited, but from my experience withothervirtualization platforms, I have to ask if there is any QoS throttling applied to VMs. With OpenStack or DO there isoften IOPSand/or throughput throttling via libvirt to mitigate noisyneighbors.fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G --runtime=60 --group_reporting *IOPS*: *1.54M* # *Bandwidth*: *6032MiB/s (6325MB/s)* # *Latency*: * *Avg*: *39.8µs* * *99.9th percentile*: *71µs* # *CPU Usage*: *usr=22.60%, sys=77.13%* # Am 18.03.2025 um 15:27 schrieb Anthony D'Atri:Which NVMe drive SKUs specifically?# */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN:6250A02QT5A8# */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN:42R0A036T5A8# */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB<https://www.google.com/maps/search/CD61LUL15T3*+%E2%80%93+15.36+TB+?entry=gmail&source=g>– SN:6250A02UT5A8Kioxia CD6. If you were using client-class drives allmanner ofperformance issues would be expected. Is your server chassis at least PCIe Gen 4? If it’s Gen 3that mayhamper these drives. Also, how many of these are in your cluster? If it’s asmall numberyou might still benefit from chopping each into at least 2separateOSDs. And please send `ceph osd dump | grep pool`, having too fewPGswouldn’t do you any favors.Are you running a recent kernel?penultimate: 6.8.12-8-pve (VM, yes)Groovy. If you were running like a CentOS 6 or CentOS 7kernel thenNVMe issues might be expected as old kernels hadrudimentary NVMesupport.Have you updated firmware on the NVMe<https://www.google.com/maps/search/updated+firmware+on+the+NVMe?entry=gmail&source=g> devices?No.Kioxia appears to not release firmware updates publicly butyourchassis brand (Dell, HP, SMCI, etc) might have an update.e.g.https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55If there is an available update I would strongly suggestapplying.Thanks again, best regards, Gio _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an emailtoceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxxEneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206<tel:+34 943 569 206> | https://www.binovo.es Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxxEnekoLacunza Director Técnico | Zuzendari teknikoa Binovo IT Human Project 943 569 206<tel:943 569 206> elacunza@xxxxxxxxx <mailto:elacunza@xxxxxxxxx> binovo.es <//binovo.es> Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun youtube<https://www.youtube.com/user/CANALBINOVO/> linkedin<https://www.linkedin.com/company/37269706/> _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxxEneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxxEneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxxEneko Lacunza Zuzendari teknikoa | Director técnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx