Hi Giovanna,
I just tested one of my VMs:
# fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
--numjobs=4 --size=1G --runtime=60 --group_reporting
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 4 processes
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
Jobs: 4 (f=0): [f(4)][100.0%][r=33.5MiB/s][r=8578 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=24261: Thu Mar 20
12:57:26 2025
read: IOPS=8538, BW=33.4MiB/s (35.0MB/s)(2001MiB/60001msec)
slat (usec): min=309, max=4928, avg=464.54, stdev=73.15
clat (nsec): min=602, max=1532.4k, avg=1999.15, stdev=3724.16
lat (usec): min=310, max=4931, avg=466.54, stdev=73.36
clat percentiles (nsec):
| 1.00th=[ 812], 5.00th=[ 884], 10.00th=[ 940], 20.00th=[
1096],
| 30.00th=[ 1368], 40.00th=[ 1576], 50.00th=[ 1720], 60.00th=[
1832],
| 70.00th=[ 1944], 80.00th=[ 2096], 90.00th=[ 2480], 95.00th=[
3024],
| 99.00th=[12480], 99.50th=[15808], 99.90th=[47360],
99.95th=[61696],
| 99.99th=[90624]
bw ( KiB/s): min=30448, max=35868, per=100.00%, avg=34155.76,
stdev=269.75, samples=476
iops : min= 7612, max= 8966, avg=8538.87, stdev=67.43,
samples=476
lat (nsec) : 750=0.06%, 1000=14.94%
lat (usec) : 2=59.18%, 4=23.07%, 10=1.28%, 20=1.17%, 50=0.21%
lat (usec) : 100=0.08%, 250=0.01%, 500=0.01%
lat (msec) : 2=0.01%
cpu : usr=1.04%, sys=5.50%, ctx=537639, majf=0, minf=36
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
issued rwts: total=512316,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=33.4MiB/s (35.0MB/s), 33.4MiB/s-33.4MiB/s
(35.0MB/s-35.0MB/s), io=2001MiB (2098MB), run=60001-60001msec
Results are worse than yours, but this is on a production (not very
busy) pool with 4x3.84TB SATA disks (4 disks total vs ~15 disks in
your case) and 10G network.
VM cpu is x86_64_v3 and host CPU Ryzen 1700.
I gest almost the same IOPS with --iodepth=16 .
I tried moving the VM to a Ryzen 5900X and results are somewhat
better:
# fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
--numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=45.4MiB/s][r=11.6k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=24282: Thu Mar 20
13:18:23 2025
read: IOPS=11.6k, BW=45.5MiB/s (47.7MB/s)(2730MiB/60001msec)
slat (usec): min=110, max=21206, avg=341.21, stdev=79.69
clat (nsec): min=1390, max=42395k, avg=5147009.08, stdev=475506.40
lat (usec): min=335, max=42779, avg=5488.22, stdev=498.03
clat percentiles (usec):
| 1.00th=[ 4621], 5.00th=[ 4752], 10.00th=[ 4817], 20.00th=[
4948],
| 30.00th=[ 5014], 40.00th=[ 5080], 50.00th=[ 5080], 60.00th=[
5145],
| 70.00th=[ 5211], 80.00th=[ 5276], 90.00th=[ 5407], 95.00th=[
5538],
| 99.00th=[ 6194], 99.50th=[ 6783], 99.90th=[ 9765],
99.95th=[12125],
| 99.99th=[24249]
bw ( KiB/s): min=36434, max=48352, per=100.00%, avg=46612.18,
stdev=300.09, samples=476
iops : min= 9108, max=12088, avg=11653.04, stdev=75.03,
samples=476
lat (usec) : 2=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=99.90%, 20=0.08%, 50=0.01%
cpu : usr=0.98%, sys=4.18%, ctx=706399, majf=0, minf=99
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%,
32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%,
64=0.0%, >=64=0.0%
issued rwts: total=698956,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=45.5MiB/s (47.7MB/s), 45.5MiB/s-45.5MiB/s
(47.7MB/s-47.7MB/s), io=2730MiB (2863MB), run=60001-60001msec
I think we're limited by the IO thread. I suggest you try multiple
disks with SCSI Virtio single.
My VM conf:
agent: 1
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v3
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=9.0.2,ctime=1739888364
name: elacunza-btrfs-test
net0: virtio=BC:24:11:47:9B:58,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: proxmox_r3_ssd2:vm-112-disk-0,discard=on,iothread=1,size=15G
scsihw: virtio-scsi-single
smbios1: uuid=263ab229-4379-4abf-b6bf-615b98ccd3d4
sockets: 1
vmgenid: 13b7f2a4-2a42-4600-845a-da88f96ae6e8
I think this is a KVM/QEMU issue, not a Ceph issue :) Maybe you can
get better suggestions in pve-user mailing list.
Cheers
El 20/3/25 a las 12:29, Giovanna Ratini escribió:
Hello Eneko,
this is my configuration. The performance is similar across all
VMs. I am now checking GitLab, as that is where people are
complaining the most.
agent: 1
balloon: 65000
bios: ovmf
boot: order=scsi0;net0
cores: 10
cpu: host
efidisk0: cephvm:vm-6506-disk-0,efitype=4m,size=528K
memory: 130000
meta: creation-qemu=9.0.2,ctime=1734995123
name: gitlab02
net0: virtio=BC:24:11:6E:28:71,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0:
cephvm:vm-6506-disk-1,aio=native,cache=writeback,iothread=1,size=64G,ssd=1
scsi1:
cephvm:vm-6506-disk-2,aio=native,cache=writeback,iothread=1,size=10T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0a5294c0-c82a-40f2-aae4-f5880022a2ac
sockets: 2
vmgenid: ea610fde-6c71-4b7f-9257-fa431a428e16
Cheers,
Gio
Am 20.03.2025 um 10:23 schrieb Eneko Lacunza:
Hi Giovanna,
Can you post VM's full config?
Also, can you test with IO thread enabled and SCSI virtio single,
and multiple disks?
Cheers
El 19/3/25 a las 17:27, Giovanna Ratini escribió:
hello Eneko,
Yes I did. No significant changes. :-(
Cheers,
Gio
Am Mittwoch, März 19, 2025 13:09 CET, schrieb Eneko Lacunza
<elacunza@xxxxxxxxx>:
Hi Giovanna,
Have you tried increasing iothreads option for the VM?
Cheers
El 18/3/25 a las 19:13, Giovanna Ratini escribió:
> Hello Antony,
>
> no, no QoS applied to Vms.
>
> The Server has PCIe Gen 4
>
> ceph osd dump | grep pool
> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0
object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21
flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1
application mgr
> read_balance_score 13.04
> pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0
> application cephfs read_balance_score 2.02
> pool 3 'cephfs_metadata' replicated size 3 min_size 2
crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 50 flags hashpspool stripe_width 0
pg_autoscale_bias 4
> pg_num_min 16 recovery_priority 5 application cephfs
> read_balance_score 2.42
> pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0
object_hash
> rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change
16386
> lfor 0/644/2603 flags hashpspool,selfmanaged_snaps
stripe_width 0
> application rbd read_balance_score 1.52
>
> I think, this is the default config. 🙈
>
> I will search for my chassies supermicro upgrade.
>
> Thank you
>
>
> Am 18.03.2025 um 17:57 schrieb Anthony D'Atri:
>>> Then I tested on the *Proxmox host*, and the results were
>>> significantly better.
>> My Proxmox prowess is limited, but from my experience with
other
>> virtualization platforms, I have to ask if there is any QoS
>> throttling applied to VMs. With OpenStack or DO there is
often IOPS
>> and/or throughput throttling via libvirt to mitigate noisy
neighbors.
>>
>>> fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio
>>> --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G
>>> --runtime=60 --group_reporting
>>>
>>> *IOPS*: *1.54M*
>>>
>>> # *Bandwidth*: *6032MiB/s (6325MB/s)*
>>> # *Latency*:
>>>
>>> * *Avg*: *39.8µs*
>>> * *99.9th percentile*: *71µs*
>>>
>>> # *CPU Usage*: *usr=22.60%, sys=77.13%*
>>> #
>>>
>>> Am 18.03.2025 um 15:27 schrieb Anthony D'Atri:
>>>> Which NVMe drive SKUs specifically?
>>> # */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN:
6250A02QT5A8
>>> # */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN:
42R0A036T5A8
>>> # */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB – SN:
6250A02UT5A8
>> Kioxia CD6. If you were using client-class drives all
manner of
>> performance issues would be expected.
>>
>> Is your server chassis at least PCIe Gen 4? If it’s Gen 3
that may
>> hamper these drives.
>>
>> Also, how many of these are in your cluster? If it’s a small
number
>> you might still benefit from chopping each into at least 2
separate
>> OSDs.
>>
>> And please send `ceph osd dump | grep pool`, having too few PGs
>> wouldn’t do you any favors.
>>
>>
>>>> Are you running a recent kernel?
>>> penultimate: 6.8.12-8-pve (VM, yes)
>> Groovy. If you were running like a CentOS 6 or CentOS 7
kernel then
>> NVMe issues might be expected as old kernels had rudimentary
NVMe
>> support.
>>
>>>> Have you updated firmware on the NVMe devices?
>>> No.
>> Kioxia appears to not release firmware updates publicly but
your
>> chassis brand (Dell, HP, SMCI, etc) might have an update.
>>
e.g.https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55
>>
>>
>> If there is an available update I would strongly suggest
applying.
>
>>
>>> Thanks again,
>>>
>>> best regards,
>>> Gio
>>>
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@xxxxxxx
>>> To unsubscribe send an email toceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 <tel:+34 943 569 206> | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
EnekoLacunza
Director Técnico | Zuzendari teknikoa
Binovo IT Human Project
943 569 206 <tel:943 569 206>
elacunza@xxxxxxxxx <mailto:elacunza@xxxxxxxxx>
binovo.es <//binovo.es>
Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
youtube <https://www.youtube.com/user/CANALBINOVO/>
linkedin <https://www.linkedin.com/company/37269706/>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx