Re: Experience with 100G Ceph in Proxmox

Eric Le Lay <eric.lelay@xxxxxxxx> · Fri, 21 Mar 2025 18:15:28 +0100

Hi,

these are very impressive results! On HDD even!!

Here are results on my cluster:

|      | no cache | writeback | unsafe  |
| ---- | -------- | --------- | ------- |
| RBD  | 40MB/s   | 40MB/s    | ?       |
| KRBD | 40MB/s   | 245MB/s   | 245MB/s |

cluster: 8 proxmox nodes, 6 of them hosting ceph OSDs, 17 NVMe OSDs, 
dual 25G connections, running ceph 18.2.4, pve 8.2.7.

VM with 4 vcpu, 8GB ram, using krbd, cache writeback, direct ext4 on 
virtio-scsi single controller, iothread=1

40MB/s is about 10k IOPS, 10ms typical latency

245MB/s is about 55k IOPS, 1ms typical latency

It also depends on the cache in the VM: freshly after a reboot, the disk 
using krbd, writeback was at 56MB/s the first time, then 300MB/s when 
re-running.
It performed better than prior results given in the table above, where I 
run all tests in sequence. I suspect because no cache eviction had to be 
done.

Best,

=============== KRBD_WRITEBACK ============

registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=204MiB/s][r=52.3k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=1083: Fri Mar 21 
18:03:09 2025
  read: IOPS=13.6k, BW=53.2MiB/s (55.8MB/s)(3195MiB/60001msec)
    slat (usec): min=27, max=24503, avg=291.01, stdev=270.71
    clat (usec): min=2, max=30294, avg=4402.54, stdev=2199.54
     lat (usec): min=52, max=30621, avg=4693.55, stdev=2333.21
    clat percentiles (usec):
     |  1.00th=[  799],  5.00th=[  914], 10.00th=[ 1254], 20.00th=[ 1926],
     | 30.00th=[ 2868], 40.00th=[ 3949], 50.00th=[ 4883], 60.00th=[ 5538],
     | 70.00th=[ 5932], 80.00th=[ 6259], 90.00th=[ 6652], 95.00th=[ 7242],
     | 99.00th=[ 9503], 99.50th=[10552], 99.90th=[13304], 99.95th=[14746],
     | 99.99th=[19006]
   bw (  KiB/s): min=30440, max=209488, per=97.68%, avg=53257.61, 
stdev=8053.45, samples=476
   iops        : min= 7610, max=52372, avg=13314.40, stdev=2013.36, 
samples=476
  lat (usec)   : 4=0.01%, 100=0.01%, 250=0.01%, 500=0.01%, 750=0.20%
  lat (usec)   : 1000=5.87%
  lat (msec)   : 2=14.87%, 4=19.56%, 10=58.78%, 20=0.71%, 50=0.01%
  cpu          : usr=1.05%, sys=7.49%, ctx=817981, majf=0, minf=103
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=817858,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=53.2MiB/s (55.8MB/s), 53.2MiB/s-53.2MiB/s 
(55.8MB/s-55.8MB/s), io=3195MiB (3350MB), run=60001-60001msec

Disk stats (read/write):
  sdb: ios=811605/4, merge=0/1, ticks=226346/3, in_queue=226350, 
util=100.00%

======== RUN 2 =========

registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=277MiB/s][r=70.9k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=1094: Fri Mar 21 
18:03:23 2025
  read: IOPS=72.9k, BW=285MiB/s (299MB/s)(4096MiB/14384msec)
    slat (usec): min=20, max=3941, avg=51.95, stdev=24.49
    clat (usec): min=2, max=5271, avg=810.08, stdev=117.76
     lat (usec): min=45, max=5334, avg=862.03, stdev=122.81
    clat percentiles (usec):
     |  1.00th=[  660],  5.00th=[  709], 10.00th=[  725], 20.00th=[  750],
     | 30.00th=[  766], 40.00th=[  783], 50.00th=[  791], 60.00th=[  807],
     | 70.00th=[  816], 80.00th=[  840], 90.00th=[  881], 95.00th=[  955],
     | 99.00th=[ 1336], 99.50th=[ 1483], 99.90th=[ 1860], 99.95th=[ 2147],
     | 99.99th=[ 2900]
   bw (  KiB/s): min=196264, max=315960, per=100.00%, avg=293362.00, 
stdev=6237.12, samples=112
   iops        : min=49066, max=78990, avg=73340.50, stdev=1559.28, 
samples=112
  lat (usec)   : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=19.55%, 1000=76.53%
  lat (msec)   : 2=3.85%, 4=0.06%, 10=0.01%
  cpu          : usr=4.22%, sys=36.79%, ctx=1048623, majf=0, minf=110
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=285MiB/s (299MB/s), 285MiB/s-285MiB/s (299MB/s-299MB/s), 
io=4096MiB (4295MB), run=14384-14384msec

Disk stats (read/write):
  sdb: ios=1036222/0, merge=0/0, ticks=41377/0, in_queue=41378, util=96.85%

# free -h
               total        used        free      shared buff/cache   
available
Mem:           7.7Gi       387Mi       3.3Gi       628Ki 4.3Gi       7.4Gi
Swap:             0B          0B          0B

=============== RBD_NOCACHE ============

registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=38.6MiB/s][r=9890 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=1822: Fri Mar 21 
17:40:55 2025
  read: IOPS=10.2k, BW=39.9MiB/s (41.9MB/s)(2396MiB/60001msec)
    slat (usec): min=146, max=10638, avg=388.14, stdev=174.69
    clat (usec): min=2, max=21535, avg=5858.97, stdev=832.41
     lat (usec): min=362, max=21954, avg=6247.12, stdev=868.24
    clat percentiles (usec):
     |  1.00th=[ 4686],  5.00th=[ 4948], 10.00th=[ 5080], 20.00th=[ 5276],
     | 30.00th=[ 5473], 40.00th=[ 5604], 50.00th=[ 5735], 60.00th=[ 5866],
     | 70.00th=[ 6063], 80.00th=[ 6259], 90.00th=[ 6587], 95.00th=[ 7111],
     | 99.00th=[ 9241], 99.50th=[10159], 99.90th=[12911], 99.95th=[14091],
     | 99.99th=[16319]
   bw (  KiB/s): min=27920, max=45992, per=100.00%, avg=40919.80, 
stdev=703.00, samples=476
   iops        : min= 6980, max=11498, avg=10229.95, stdev=175.75, 
samples=476
  lat (usec)   : 4=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=99.42%, 20=0.57%, 50=0.01%
  cpu          : usr=0.74%, sys=5.96%, ctx=613573, majf=0, minf=105
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=613414,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=39.9MiB/s (41.9MB/s), 39.9MiB/s-39.9MiB/s 
(41.9MB/s-41.9MB/s), io=2396MiB (2513MB), run=60001-60001msec

Disk stats (read/write):
  sdc: ios=612284/0, merge=0/0, ticks=229266/0, in_queue=229266, 
util=99.70%

Le 20/03/2025 à 15:15, Chris Palmer a écrit :
CAUTION: This email originated from outside the organization. Do not 
click links or open attachments unless you recognize the sender and 
know the content is safe.

I just ran that command on one of my VMs. Salient details:

 * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe
   DB/WAL, single 10g NICs
 * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single
   10g NICs , single 1g NICs for corosync
 * Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring,
   cache=writeback

The results are very different:

# fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
--numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B,
(T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.37
Starting 4 processes
Jobs: 4 (f=4): [r(4)][-.-%][r=1080MiB/s][r=277k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=13355: Thu Mar 20
13:57:05 2025
  read: IOPS=273k, BW=1068MiB/s (1120MB/s)(4096MiB/3835msec)
    slat (usec): min=7, max=3802, avg=13.77, stdev= 6.41
    clat (nsec): min=599, max=4395.1k, avg=215298.68, stdev=38131.71
     lat (usec): min=11, max=4408, avg=229.07, stdev=40.01
    clat percentiles (usec):
     |  1.00th=[  194],  5.00th=[  200], 10.00th=[  202], 20.00th=[  
204],
     | 30.00th=[  206], 40.00th=[  208], 50.00th=[  210], 60.00th=[  
212],
     | 70.00th=[  215], 80.00th=[  217], 90.00th=[  227], 95.00th=[  
243],
     | 99.00th=[  367], 99.50th=[  420], 99.90th=[  594], 99.95th=[  
668],
     | 99.99th=[  963]
   bw (  MiB/s): min=  920, max= 1118, per=100.00%, avg=1068.04,
stdev=16.81, samples=28
   iops        : min=235566, max=286286, avg=273417.14, stdev=4303.79,
samples=28
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.01%, 250=96.06%, 500=3.67%
  lat (usec)   : 750=0.24%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=4.68%, sys=29.99%, ctx=1048987, majf=0, minf=102
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=1068MiB/s (1120MB/s), 1068MiB/s-1068MiB/s
(1120MB/s-1120MB/s), io=4096MiB (4295MB), run=3835-3835msec

Disk stats (read/write):
  sdc: ios=999346/0, sectors=7994768/0, merge=0/0, ticks=10360/0,
in_queue=10361, util=95.49%

On 20/03/2025 12:23, Eneko Lacunza wrote:
Hi Giovanna,

I just tested one of my VMs:
# fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
--numjobs=4 --size=1G --runtime=60 --group_reporting
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 4 processes
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
registry-read: Laying out IO file (1 file / 1024MiB)
Jobs: 4 (f=0): [f(4)][100.0%][r=33.5MiB/s][r=8578 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=24261: Thu Mar 20
12:57:26 2025
  read: IOPS=8538, BW=33.4MiB/s (35.0MB/s)(2001MiB/60001msec)
    slat (usec): min=309, max=4928, avg=464.54, stdev=73.15
    clat (nsec): min=602, max=1532.4k, avg=1999.15, stdev=3724.16
     lat (usec): min=310, max=4931, avg=466.54, stdev=73.36
    clat percentiles (nsec):
     |  1.00th=[  812],  5.00th=[  884], 10.00th=[  940], 20.00th=[
1096],
     | 30.00th=[ 1368], 40.00th=[ 1576], 50.00th=[ 1720], 60.00th=[
1832],
     | 70.00th=[ 1944], 80.00th=[ 2096], 90.00th=[ 2480], 95.00th=[
3024],
     | 99.00th=[12480], 99.50th=[15808], 99.90th=[47360],
99.95th=[61696],
     | 99.99th=[90624]
   bw (  KiB/s): min=30448, max=35868, per=100.00%, avg=34155.76,
stdev=269.75, samples=476
   iops        : min= 7612, max= 8966, avg=8538.87, stdev=67.43,
samples=476
  lat (nsec)   : 750=0.06%, 1000=14.94%
  lat (usec)   : 2=59.18%, 4=23.07%, 10=1.28%, 20=1.17%, 50=0.21%
  lat (usec)   : 100=0.08%, 250=0.01%, 500=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=1.04%, sys=5.50%, ctx=537639, majf=0, minf=36
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued rwts: total=512316,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=33.4MiB/s (35.0MB/s), 33.4MiB/s-33.4MiB/s
(35.0MB/s-35.0MB/s), io=2001MiB (2098MB), run=60001-60001msec

Results are worse than yours, but this is on a production (not very
busy) pool with 4x3.84TB SATA disks (4 disks total vs ~15 disks in
your case) and 10G network.

VM cpu is x86_64_v3 and host CPU Ryzen 1700.

I gest almost the same IOPS with --iodepth=16 .

I tried moving the VM to a Ryzen 5900X and results are somewhat better:

# fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
--numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=45.4MiB/s][r=11.6k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=24282: Thu Mar 20
13:18:23 2025
  read: IOPS=11.6k, BW=45.5MiB/s (47.7MB/s)(2730MiB/60001msec)
    slat (usec): min=110, max=21206, avg=341.21, stdev=79.69
    clat (nsec): min=1390, max=42395k, avg=5147009.08, stdev=475506.40
     lat (usec): min=335, max=42779, avg=5488.22, stdev=498.03
    clat percentiles (usec):
     |  1.00th=[ 4621],  5.00th=[ 4752], 10.00th=[ 4817], 20.00th=[
4948],
     | 30.00th=[ 5014], 40.00th=[ 5080], 50.00th=[ 5080], 60.00th=[
5145],
     | 70.00th=[ 5211], 80.00th=[ 5276], 90.00th=[ 5407], 95.00th=[
5538],
     | 99.00th=[ 6194], 99.50th=[ 6783], 99.90th=[ 9765],
99.95th=[12125],
     | 99.99th=[24249]
   bw (  KiB/s): min=36434, max=48352, per=100.00%, avg=46612.18,
stdev=300.09, samples=476
   iops        : min= 9108, max=12088, avg=11653.04, stdev=75.03,
samples=476
  lat (usec)   : 2=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=99.90%, 20=0.08%, 50=0.01%
  cpu          : usr=0.98%, sys=4.18%, ctx=706399, majf=0, minf=99
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued rwts: total=698956,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=45.5MiB/s (47.7MB/s), 45.5MiB/s-45.5MiB/s
(47.7MB/s-47.7MB/s), io=2730MiB (2863MB), run=60001-60001msec

I think we're limited by the IO thread. I suggest you try multiple
disks with SCSI Virtio single.

My VM conf:
agent: 1
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v3
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=9.0.2,ctime=1739888364
name: elacunza-btrfs-test
net0: virtio=BC:24:11:47:9B:58,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: proxmox_r3_ssd2:vm-112-disk-0,discard=on,iothread=1,size=15G
scsihw: virtio-scsi-single
smbios1: uuid=263ab229-4379-4abf-b6bf-615b98ccd3d4
sockets: 1
vmgenid: 13b7f2a4-2a42-4600-845a-da88f96ae6e8

I think this is a KVM/QEMU issue, not a Ceph issue :) Maybe you can
get better suggestions in pve-user mailing list.

Cheers

El 20/3/25 a las 12:29, Giovanna Ratini escribió:
Hello Eneko,

this is my configuration. The performance is similar across all VMs.
I am now checking GitLab, as that is where people are complaining the
most.

agent: 1
balloon: 65000
bios: ovmf
boot: order=scsi0;net0
cores: 10
cpu: host
efidisk0: cephvm:vm-6506-disk-0,efitype=4m,size=528K
memory: 130000
meta: creation-qemu=9.0.2,ctime=1734995123
name: gitlab02
net0: virtio=BC:24:11:6E:28:71,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0:
cephvm:vm-6506-disk-1,aio=native,cache=writeback,iothread=1,size=64G,ssd=1 

scsi1:
cephvm:vm-6506-disk-2,aio=native,cache=writeback,iothread=1,size=10T,ssd=1 

scsihw: virtio-scsi-single
smbios1: uuid=0a5294c0-c82a-40f2-aae4-f5880022a2ac
sockets: 2
vmgenid: ea610fde-6c71-4b7f-9257-fa431a428e16

Cheers,

Gio

Am 20.03.2025 um 10:23 schrieb Eneko Lacunza:
Hi Giovanna,

Can you post VM's full config?

Also, can you test with IO thread enabled and SCSI virtio single,
and multiple disks?

Cheers

El 19/3/25 a las 17:27, Giovanna Ratini escribió:

hello Eneko,

Yes I did.  No significant changes.  :-(
Cheers,

Gio

Am Mittwoch, März 19, 2025 13:09 CET, schrieb Eneko Lacunza
<elacunza@xxxxxxxxx>:

Hi Giovanna,

Have you tried increasing iothreads option for the VM?

Cheers

El 18/3/25 a las 19:13, Giovanna Ratini escribió:
> Hello Antony,
>
> no, no QoS applied to Vms.
>
> The Server has PCIe Gen 4
>
> ceph osd dump | grep pool
> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 
object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
mgr
> read_balance_score 13.04
> pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0
> application cephfs read_balance_score 2.02
> pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 50 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 16 recovery_priority 5 application cephfs
> read_balance_score 2.42
> pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0
object_hash
> rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 
16386
> lfor 0/644/2603 flags hashpspool,selfmanaged_snaps stripe_width 0
> application rbd read_balance_score 1.52
>
> I think, this is the default config. 🙈
>
> I will search for my chassies supermicro upgrade.
>
> Thank you
>
>
> Am 18.03.2025 um 17:57 schrieb Anthony D'Atri:
>>> Then I tested on the *Proxmox host*, and the results were
>>> significantly better.
>> My Proxmox prowess is limited, but from my experience with other
>> virtualization platforms, I have to ask if there is any QoS
>> throttling applied to VMs.  With OpenStack or DO there is often
IOPS
>> and/or throughput throttling via libvirt to mitigate noisy
neighbors.
>>
>>>   fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio
>>> --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G
>>> --runtime=60 --group_reporting
>>>
>>> *IOPS*: *1.54M*
>>>
>>> # *Bandwidth*: *6032MiB/s (6325MB/s)*
>>> # *Latency*:
>>>
>>> * *Avg*: *39.8µs*
>>> * *99.9th percentile*: *71µs*
>>>
>>> # *CPU Usage*: *usr=22.60%, sys=77.13%*
>>> #
>>>
>>> Am 18.03.2025 um 15:27 schrieb Anthony D'Atri:
>>>> Which NVMe drive SKUs specifically?
>>> # */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02QT5A8
>>> # */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN: 42R0A036T5A8
>>> # */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02UT5A8
>> Kioxia CD6.  If you were using client-class drives all manner of
>> performance issues would be expected.
>>
>> Is your server chassis at least PCIe Gen 4?  If it’s Gen 3 that
may
>> hamper these drives.
>>
>> Also, how many of these are in your cluster? If it’s a small
number
>> you might still benefit from chopping each into at least 2
separate
>> OSDs.
>>
>> And please send `ceph osd dump | grep pool`, having too few PGs
>> wouldn’t do you any favors.
>>
>>
>>>> Are you running a recent kernel?
>>> penultimate: 6.8.12-8-pve (VM, yes)
>> Groovy.  If you were running like a CentOS 6 or CentOS 7 kernel
then
>> NVMe issues might be expected as old kernels had rudimentary NVMe
>> support.
>>
>>>>   Have you updated firmware on the NVMe devices?
>>> No.
>> Kioxia appears to not release firmware updates publicly but your
>> chassis brand (Dell, HP, SMCI, etc) might have an update.
>>
e.g.https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55 

>>
>>
>>   If there is an available update I would strongly suggest
applying.
>
>>
>>> Thanks again,
>>>
>>> best regards,
>>> Gio
>>>
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@xxxxxxx
>>> To unsubscribe send an email toceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 <tel:+34 943 569 206> | 
https://www.binovo.es/
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO 

https://www.linkedin.com/company/37269706/ 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

     EnekoLacunza

Director Técnico | Zuzendari teknikoa

Binovo IT Human Project

    943 569 206 <tel:943 569 206>

elacunza@xxxxxxxxx <mailto:elacunza@xxxxxxxxx>

    binovo.es <//binovo.es>

    Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun

youtube 
<https://www.youtube.com/user/CANALBINOVO/>
    linkedin 
<https://www.linkedin.com/company/37269706/>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | 
https://www.binovo.es/
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO 

https://www.linkedin.com/company/37269706/ 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx