Re: Experience with 100G Ceph in Proxmox

Alvaro Soto <alsotoes@xxxxxxxxx> · Fri, 21 Mar 2025 02:27:52 -0600

Digging into a different direction. I have question, are the drives
connected a raid array card? And how are they presented?

I don't recall where did I read something about a raid card presenting
drives to the kernel as SCSI instead of NVMe and the queue depth being the
issue.

Cheers.
--

Alvaro Soto

Note: My work hours may not be your work hours. Please do not feel the need
to respond during a time that is not convenient for you.
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people

On Thu, Mar 20, 2025, 2:13 PM Giovanna Ratini <
giovanna.ratini@xxxxxxxxxxxxxxx> wrote:

> Hello,
>
> Yes, I will test KRBD. I will be on holiday next week, so I don’t want
> to make any changes before then.
>
> Could you wait until 29.3?
>
> This is a production environment, and restoring a backup would take
> time. Or do you think there’s no risk in making the change without concern?
>
> Thank you,
>
> best Regards,
> Gio
>
>
> Am 20.03.2025 um 16:57 schrieb Eneko Lacunza:
> > Hi Chris,
> >
> > I tried KRBD, even with a newly created disk and after shuting down
> > and starting VM again, but no measurable difference.
> >
> > Our Ceph is 18.2.4, that may be a factor to consider, but 9k -> 273k?!
> >
> > Maybe Giovanna can test KRBD option and report back... :)
> >
> > Cheers
> >
> > El 20/3/25 a las 16:19, Chris Palmer escribió:
> >> HI Eneko
> >>
> >> No containers. In the Promox console go to Datacenter\Storage, click
> >> on the storage you are using, then Edit. There is a tick box KRBD.
> >> With that set, any virtual disks created in that storage will use
> >> KRBD rather than librbd. So it applies to all VMs that use that storage.
> >>
> >> Chris
> >>
> >> On 20/03/2025 15:00, Eneko Lacunza wrote:
> >>>
> >>> Chris, you tested from a container? Or how do you configure a KRBD
> >>> disk for a VM?
> >>>
> >>> El 20/3/25 a las 15:15, Chris Palmer escribió:
> >>>> I just ran that command on one of my VMs. Salient details:
> >>>>
> >>>>   * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe
> >>>>     DB/WAL, single 10g NICs
> >>>>   * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single
> >>>>     10g NICs , single 1g NICs for corosync
> >>>>   * Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring,
> >>>>     cache=writeback
> >>>>
> >>>> The results are very different:
> >>>>
> >>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
> >>>> --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
> >>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
> >>>> 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
> >>>> ...
> >>>> fio-3.37
> >>>> Starting 4 processes
> >>>> Jobs: 4 (f=4): [r(4)][-.-%][r=1080MiB/s][r=277k IOPS][eta 00m:00s]
> >>>> registry-read: (groupid=0, jobs=4): err= 0: pid=13355: Thu Mar 20
> >>>> 13:57:05 2025
> >>>>   read: IOPS=273k, BW=1068MiB/s (1120MB/s)(4096MiB/3835msec)
> >>>>     slat (usec): min=7, max=3802, avg=13.77, stdev= 6.41
> >>>>     clat (nsec): min=599, max=4395.1k, avg=215298.68, stdev=38131.71
> >>>>      lat (usec): min=11, max=4408, avg=229.07, stdev=40.01
> >>>>     clat percentiles (usec):
> >>>>      |  1.00th=[  194],  5.00th=[  200], 10.00th=[  202],
> >>>> 20.00th=[  204],
> >>>>      | 30.00th=[  206], 40.00th=[  208], 50.00th=[  210],
> >>>> 60.00th=[  212],
> >>>>      | 70.00th=[  215], 80.00th=[  217], 90.00th=[  227],
> >>>> 95.00th=[  243],
> >>>>      | 99.00th=[  367], 99.50th=[  420], 99.90th=[  594],
> >>>> 99.95th=[  668],
> >>>>      | 99.99th=[  963]
> >>>>    bw (  MiB/s): min=  920, max= 1118, per=100.00%, avg=1068.04,
> >>>> stdev=16.81, samples=28
> >>>>    iops        : min=235566, max=286286, avg=273417.14,
> >>>> stdev=4303.79, samples=28
> >>>>   lat (nsec)   : 750=0.01%, 1000=0.01%
> >>>>   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.01%, 250=96.06%, 500=3.67%
> >>>>   lat (usec)   : 750=0.24%, 1000=0.02%
> >>>>   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
> >>>>   cpu          : usr=4.68%, sys=29.99%, ctx=1048987, majf=0, minf=102
> >>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%,
> >>>> 32=0.0%, >=64=0.0%
> >>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>> 64=0.0%, >=64=0.0%
> >>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%,
> >>>> 64=0.0%, >=64=0.0%
> >>>>      issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>>>      latency   : target=0, window=0, percentile=100.00%, depth=16
> >>>>
> >>>> Run status group 0 (all jobs):
> >>>>    READ: bw=1068MiB/s (1120MB/s), 1068MiB/s-1068MiB/s
> >>>> (1120MB/s-1120MB/s), io=4096MiB (4295MB), run=3835-3835msec
> >>>>
> >>>> Disk stats (read/write):
> >>>>   sdc: ios=999346/0, sectors=7994768/0, merge=0/0, ticks=10360/0,
> >>>> in_queue=10361, util=95.49%
> >>>>
> >>>>
> >>>>
> >>>> On 20/03/2025 12:23, Eneko Lacunza wrote:
> >>>>> Hi Giovanna,
> >>>>>
> >>>>> I just tested one of my VMs:
> >>>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
> >>>>> --numjobs=4 --size=1G --runtime=60 --group_reporting
> >>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
> >>>>> 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
> >>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
> >>>>> 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
> >>>>> ...
> >>>>> fio-3.33
> >>>>> Starting 4 processes
> >>>>> registry-read: Laying out IO file (1 file / 1024MiB)
> >>>>> registry-read: Laying out IO file (1 file / 1024MiB)
> >>>>> registry-read: Laying out IO file (1 file / 1024MiB)
> >>>>> registry-read: Laying out IO file (1 file / 1024MiB)
> >>>>> Jobs: 4 (f=0): [f(4)][100.0%][r=33.5MiB/s][r=8578 IOPS][eta 00m:00s]
> >>>>> registry-read: (groupid=0, jobs=4): err= 0: pid=24261: Thu Mar 20
> >>>>> 12:57:26 2025
> >>>>>   read: IOPS=8538, BW=33.4MiB/s (35.0MB/s)(2001MiB/60001msec)
> >>>>>     slat (usec): min=309, max=4928, avg=464.54, stdev=73.15
> >>>>>     clat (nsec): min=602, max=1532.4k, avg=1999.15, stdev=3724.16
> >>>>>      lat (usec): min=310, max=4931, avg=466.54, stdev=73.36
> >>>>>     clat percentiles (nsec):
> >>>>>      |  1.00th=[  812],  5.00th=[  884], 10.00th=[  940],
> >>>>> 20.00th=[ 1096],
> >>>>>      | 30.00th=[ 1368], 40.00th=[ 1576], 50.00th=[ 1720],
> >>>>> 60.00th=[ 1832],
> >>>>>      | 70.00th=[ 1944], 80.00th=[ 2096], 90.00th=[ 2480],
> >>>>> 95.00th=[ 3024],
> >>>>>      | 99.00th=[12480], 99.50th=[15808], 99.90th=[47360],
> >>>>> 99.95th=[61696],
> >>>>>      | 99.99th=[90624]
> >>>>>    bw (  KiB/s): min=30448, max=35868, per=100.00%, avg=34155.76,
> >>>>> stdev=269.75, samples=476
> >>>>>    iops        : min= 7612, max= 8966, avg=8538.87, stdev=67.43,
> >>>>> samples=476
> >>>>>   lat (nsec)   : 750=0.06%, 1000=14.94%
> >>>>>   lat (usec)   : 2=59.18%, 4=23.07%, 10=1.28%, 20=1.17%, 50=0.21%
> >>>>>   lat (usec)   : 100=0.08%, 250=0.01%, 500=0.01%
> >>>>>   lat (msec)   : 2=0.01%
> >>>>>   cpu          : usr=1.04%, sys=5.50%, ctx=537639, majf=0, minf=36
> >>>>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> >>>>> 32=0.0%, >=64=0.0%
> >>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>>> 64=0.0%, >=64=0.0%
> >>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>>> 64=0.0%, >=64=0.0%
> >>>>>      issued rwts: total=512316,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>>>>      latency   : target=0, window=0, percentile=100.00%, depth=1
> >>>>>
> >>>>> Run status group 0 (all jobs):
> >>>>>    READ: bw=33.4MiB/s (35.0MB/s), 33.4MiB/s-33.4MiB/s
> >>>>> (35.0MB/s-35.0MB/s), io=2001MiB (2098MB), run=60001-60001msec
> >>>>>
> >>>>> Results are worse than yours, but this is on a production (not
> >>>>> very busy) pool with 4x3.84TB SATA disks (4 disks total vs ~15
> >>>>> disks in your case) and 10G network.
> >>>>>
> >>>>> VM cpu is x86_64_v3 and host CPU Ryzen 1700.
> >>>>>
> >>>>> I gest almost the same IOPS with --iodepth=16 .
> >>>>>
> >>>>> I tried moving the VM to a Ryzen 5900X and results are somewhat
> >>>>> better:
> >>>>>
> >>>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k
> >>>>> --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
> >>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
> >>>>> 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
> >>>>> ...
> >>>>> fio-3.33
> >>>>> Starting 4 processes
> >>>>> Jobs: 4 (f=4): [r(4)][100.0%][r=45.4MiB/s][r=11.6k IOPS][eta 00m:00s]
> >>>>> registry-read: (groupid=0, jobs=4): err= 0: pid=24282: Thu Mar 20
> >>>>> 13:18:23 2025
> >>>>>   read: IOPS=11.6k, BW=45.5MiB/s (47.7MB/s)(2730MiB/60001msec)
> >>>>>     slat (usec): min=110, max=21206, avg=341.21, stdev=79.69
> >>>>>     clat (nsec): min=1390, max=42395k, avg=5147009.08,
> >>>>> stdev=475506.40
> >>>>>      lat (usec): min=335, max=42779, avg=5488.22, stdev=498.03
> >>>>>     clat percentiles (usec):
> >>>>>      |  1.00th=[ 4621],  5.00th=[ 4752], 10.00th=[ 4817],
> >>>>> 20.00th=[ 4948],
> >>>>>      | 30.00th=[ 5014], 40.00th=[ 5080], 50.00th=[ 5080],
> >>>>> 60.00th=[ 5145],
> >>>>>      | 70.00th=[ 5211], 80.00th=[ 5276], 90.00th=[ 5407],
> >>>>> 95.00th=[ 5538],
> >>>>>      | 99.00th=[ 6194], 99.50th=[ 6783], 99.90th=[ 9765],
> >>>>> 99.95th=[12125],
> >>>>>      | 99.99th=[24249]
> >>>>>    bw (  KiB/s): min=36434, max=48352, per=100.00%, avg=46612.18,
> >>>>> stdev=300.09, samples=476
> >>>>>    iops        : min= 9108, max=12088, avg=11653.04, stdev=75.03,
> >>>>> samples=476
> >>>>>   lat (usec)   : 2=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> >>>>>   lat (msec)   : 2=0.01%, 4=0.01%, 10=99.90%, 20=0.08%, 50=0.01%
> >>>>>   cpu          : usr=0.98%, sys=4.18%, ctx=706399, majf=0, minf=99
> >>>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%,
> >>>>> 32=0.0%, >=64=0.0%
> >>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>>> 64=0.0%, >=64=0.0%
> >>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%,
> >>>>> 64=0.0%, >=64=0.0%
> >>>>>      issued rwts: total=698956,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>>>>      latency   : target=0, window=0, percentile=100.00%, depth=16
> >>>>>
> >>>>> Run status group 0 (all jobs):
> >>>>>    READ: bw=45.5MiB/s (47.7MB/s), 45.5MiB/s-45.5MiB/s
> >>>>> (47.7MB/s-47.7MB/s), io=2730MiB (2863MB), run=60001-60001msec
> >>>>>
> >>>>> I think we're limited by the IO thread. I suggest you try multiple
> >>>>> disks with SCSI Virtio single.
> >>>>>
> >>>>> My VM conf:
> >>>>> agent: 1
> >>>>> boot: order=scsi0;ide2;net0
> >>>>> cores: 2
> >>>>> cpu: x86-64-v3
> >>>>> ide2: none,media=cdrom
> >>>>> memory: 2048
> >>>>> meta: creation-qemu=9.0.2,ctime=1739888364
> >>>>> name: elacunza-btrfs-test
> >>>>> net0: virtio=BC:24:11:47:9B:58,bridge=vmbr0,firewall=1
> >>>>> numa: 0
> >>>>> ostype: l26
> >>>>> scsi0: proxmox_r3_ssd2:vm-112-disk-0,discard=on,iothread=1,size=15G
> >>>>> scsihw: virtio-scsi-single
> >>>>> smbios1: uuid=263ab229-4379-4abf-b6bf-615b98ccd3d4
> >>>>> sockets: 1
> >>>>> vmgenid: 13b7f2a4-2a42-4600-845a-da88f96ae6e8
> >>>>>
> >>>>> I think this is a KVM/QEMU issue, not a Ceph issue :) Maybe you
> >>>>> can get better suggestions in pve-user mailing list.
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> El 20/3/25 a las 12:29, Giovanna Ratini escribió:
> >>>>>> Hello Eneko,
> >>>>>>
> >>>>>> this is my configuration. The performance is similar across all
> >>>>>> VMs. I am now checking GitLab, as that is where people are
> >>>>>> complaining the most.
> >>>>>>
> >>>>>> agent: 1
> >>>>>> balloon: 65000
> >>>>>> bios: ovmf
> >>>>>> boot: order=scsi0;net0
> >>>>>> cores: 10
> >>>>>> cpu: host
> >>>>>> efidisk0: cephvm:vm-6506-disk-0,efitype=4m,size=528K
> >>>>>> memory: 130000
> >>>>>> meta: creation-qemu=9.0.2,ctime=1734995123
> >>>>>> name: gitlab02
> >>>>>> net0: virtio=BC:24:11:6E:28:71,bridge=vmbr1,firewall=1
> >>>>>> numa: 0
> >>>>>> ostype: l26
> >>>>>> scsi0:
> >>>>>>
> cephvm:vm-6506-disk-1,aio=native,cache=writeback,iothread=1,size=64G,ssd=1
> >>>>>> scsi1:
> >>>>>>
> cephvm:vm-6506-disk-2,aio=native,cache=writeback,iothread=1,size=10T,ssd=1
> >>>>>> scsihw: virtio-scsi-single
> >>>>>> smbios1: uuid=0a5294c0-c82a-40f2-aae4-f5880022a2ac
> >>>>>> sockets: 2
> >>>>>> vmgenid: ea610fde-6c71-4b7f-9257-fa431a428e16
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Gio
> >>>>>>
> >>>>>> Am 20.03.2025 um 10:23 schrieb Eneko Lacunza:
> >>>>>>> Hi Giovanna,
> >>>>>>>
> >>>>>>> Can you post VM's full config?
> >>>>>>>
> >>>>>>> Also, can you test with IO thread enabled and SCSI virtio
> >>>>>>> single, and multiple disks?
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>> El 19/3/25 a las 17:27, Giovanna Ratini escribió:
> >>>>>>>>
> >>>>>>>> hello Eneko,
> >>>>>>>>
> >>>>>>>> Yes I did.  No significant changes.  :-(
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Gio
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am Mittwoch, März 19, 2025 13:09 CET, schrieb Eneko Lacunza
> >>>>>>>> <elacunza@xxxxxxxxx>:
> >>>>>>>>
> >>>>>>>>> Hi Giovanna,
> >>>>>>>>>
> >>>>>>>>> Have you tried increasing iothreads option for the VM?
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> El 18/3/25 a las 19:13, Giovanna Ratini escribió:
> >>>>>>>>> > Hello Antony,
> >>>>>>>>> >
> >>>>>>>>> > no, no QoS applied to Vms.
> >>>>>>>>> >
> >>>>>>>>> > The Server has PCIe Gen 4
> >>>>>>>>> >
> >>>>>>>>> > ceph osd dump | grep pool
> >>>>>>>>> > pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0
> >>>>>>>>> object_hash
> >>>>>>>>> > rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21
> >>>>>>>>> flags
> >>>>>>>>> > hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1
> >>>>>>>>> application mgr
> >>>>>>>>> > read_balance_score 13.04
> >>>>>>>>> > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
> >>>>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> >>>>>>>>> > last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0
> >>>>>>>>> > application cephfs read_balance_score 2.02
> >>>>>>>>> > pool 3 'cephfs_metadata' replicated size 3 min_size 2
> >>>>>>>>> crush_rule 0
> >>>>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> >>>>>>>>> > last_change 50 flags hashpspool stripe_width 0
> >>>>>>>>> pg_autoscale_bias 4
> >>>>>>>>> > pg_num_min 16 recovery_priority 5 application cephfs
> >>>>>>>>> > read_balance_score 2.42
> >>>>>>>>> > pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0
> >>>>>>>>> object_hash
> >>>>>>>>> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on
> >>>>>>>>> last_change 16386
> >>>>>>>>> > lfor 0/644/2603 flags hashpspool,selfmanaged_snaps
> >>>>>>>>> stripe_width 0
> >>>>>>>>> > application rbd read_balance_score 1.52
> >>>>>>>>> >
> >>>>>>>>> > I think, this is the default config. 🙈
> >>>>>>>>> >
> >>>>>>>>> > I will search for my chassies supermicro upgrade.
> >>>>>>>>> >
> >>>>>>>>> > Thank you
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > Am 18.03.2025 um 17:57 schrieb Anthony D'Atri:
> >>>>>>>>> >>> Then I tested on the *Proxmox host*, and the results were
> >>>>>>>>> >>> significantly better.
> >>>>>>>>> >> My Proxmox prowess is limited, but from my experience with
> >>>>>>>>> other
> >>>>>>>>> >> virtualization platforms, I have to ask if there is any QoS
> >>>>>>>>> >> throttling applied to VMs.  With OpenStack or DO there is
> >>>>>>>>> often IOPS
> >>>>>>>>> >> and/or throughput throttling via libvirt to mitigate noisy
> >>>>>>>>> neighbors.
> >>>>>>>>> >>
> >>>>>>>>> >>>   fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio
> >>>>>>>>> >>> --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G
> >>>>>>>>> >>> --runtime=60 --group_reporting
> >>>>>>>>> >>>
> >>>>>>>>> >>> *IOPS*: *1.54M*
> >>>>>>>>> >>>
> >>>>>>>>> >>> # *Bandwidth*: *6032MiB/s (6325MB/s)*
> >>>>>>>>> >>> # *Latency*:
> >>>>>>>>> >>>
> >>>>>>>>> >>> * *Avg*: *39.8µs*
> >>>>>>>>> >>> * *99.9th percentile*: *71µs*
> >>>>>>>>> >>>
> >>>>>>>>> >>> # *CPU Usage*: *usr=22.60%, sys=77.13%*
> >>>>>>>>> >>> #
> >>>>>>>>> >>>
> >>>>>>>>> >>> Am 18.03.2025 um 15:27 schrieb Anthony D'Atri:
> >>>>>>>>> >>>> Which NVMe drive SKUs specifically?
> >>>>>>>>> >>> # */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN:
> >>>>>>>>> 6250A02QT5A8
> >>>>>>>>> >>> # */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN:
> >>>>>>>>> 42R0A036T5A8
> >>>>>>>>> >>> # */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB
> <https://www.google.com/maps/search/CD61LUL15T3*+%E2%80%93+15.36+TB+?entry=gmail&source=g>–
> SN:
> >>>>>>>>> 6250A02UT5A8
> >>>>>>>>> >> Kioxia CD6.  If you were using client-class drives all
> >>>>>>>>> manner of
> >>>>>>>>> >> performance issues would be expected.
> >>>>>>>>> >>
> >>>>>>>>> >> Is your server chassis at least PCIe Gen 4? If it’s Gen 3
> >>>>>>>>> that may
> >>>>>>>>> >> hamper these drives.
> >>>>>>>>> >>
> >>>>>>>>> >> Also, how many of these are in your cluster? If it’s a
> >>>>>>>>> small number
> >>>>>>>>> >> you might still benefit from chopping each into at least 2
> >>>>>>>>> separate
> >>>>>>>>> >> OSDs.
> >>>>>>>>> >>
> >>>>>>>>> >> And please send `ceph osd dump | grep pool`, having too few
> >>>>>>>>> PGs
> >>>>>>>>> >> wouldn’t do you any favors.
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >>>> Are you running a recent kernel?
> >>>>>>>>> >>> penultimate: 6.8.12-8-pve (VM, yes)
> >>>>>>>>> >> Groovy.  If you were running like a CentOS 6 or CentOS 7
> >>>>>>>>> kernel then
> >>>>>>>>> >> NVMe issues might be expected as old kernels had
> >>>>>>>>> rudimentary NVMe
> >>>>>>>>> >> support.
> >>>>>>>>> >>
> >>>>>>>>> >>>>   Have you updated firmware on the NVMe
> <https://www.google.com/maps/search/updated+firmware+on+the+NVMe?entry=gmail&source=g>
> devices?
> >>>>>>>>> >>> No.
> >>>>>>>>> >> Kioxia appears to not release firmware updates publicly but
> >>>>>>>>> your
> >>>>>>>>> >> chassis brand (Dell, HP, SMCI, etc) might have an update.
> >>>>>>>>> >>
> >>>>>>>>> e.g.
> https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55
> >>>>>>>>>
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >>   If there is an available update I would strongly suggest
> >>>>>>>>> applying.
> >>>>>>>>> >
> >>>>>>>>> >>
> >>>>>>>>> >>> Thanks again,
> >>>>>>>>> >>>
> >>>>>>>>> >>> best regards,
> >>>>>>>>> >>> Gio
> >>>>>>>>> >>>
> >>>>>>>>> >>> _______________________________________________
> >>>>>>>>> >>> ceph-users mailing list --ceph-users@xxxxxxx
> >>>>>>>>> >>> To unsubscribe send an email toceph-users-leave@xxxxxxx
> >>>>>>>>> > _______________________________________________
> >>>>>>>>> > ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>>>>>
> >>>>>>>>> Eneko Lacunza
> >>>>>>>>> Zuzendari teknikoa | Director técnico
> >>>>>>>>> Binovo IT Human Project
> >>>>>>>>>
> >>>>>>>>> Tel. +34 943 569 206 <tel:+34 943 569 206> |
> >>>>>>>>> https://www.binovo.es
> >>>>>>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> >>>>>>>>>
> >>>>>>>>> https://www.youtube.com/user/CANALBINOVO
> >>>>>>>>> https://www.linkedin.com/company/37269706/
> >>>>>>>>> _______________________________________________
> >>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>      EnekoLacunza
> >>>>>>>
> >>>>>>> Director Técnico | Zuzendari teknikoa
> >>>>>>>
> >>>>>>> Binovo IT Human Project
> >>>>>>>
> >>>>>>>     943 569 206 <tel:943 569 206>
> >>>>>>>
> >>>>>>> elacunza@xxxxxxxxx <mailto:elacunza@xxxxxxxxx>
> >>>>>>>
> >>>>>>>     binovo.es <//binovo.es>
> >>>>>>>
> >>>>>>>     Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
> >>>>>>>
> >>>>>>>
> >>>>>>> youtube <https://www.youtube.com/user/CANALBINOVO/>
> >>>>>>>     linkedin <https://www.linkedin.com/company/37269706/>
> >>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>
> >>>>> Eneko Lacunza
> >>>>> Zuzendari teknikoa | Director técnico
> >>>>> Binovo IT Human Project
> >>>>>
> >>>>> Tel. +34 943 569 206 | https://www.binovo.es
> >>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> >>>>>
> >>>>> https://www.youtube.com/user/CANALBINOVO
> >>>>> https://www.linkedin.com/company/37269706/
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>
> >>>
> >>> Eneko Lacunza
> >>> Zuzendari teknikoa | Director técnico
> >>> Binovo IT Human Project
> >>>
> >>> Tel. +34 943 569 206 |https://www.binovo.es
> >>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> >>>
> >>> https://www.youtube.com/user/CANALBINOVO
> >>> https://www.linkedin.com/company/37269706/
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> > Eneko Lacunza
> > Zuzendari teknikoa | Director técnico
> > Binovo IT Human Project
> >
> > Tel. +34 943 569 206 | https://www.binovo.es
> > Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> >
> > https://www.youtube.com/user/CANALBINOVO
> > https://www.linkedin.com/company/37269706/
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx