Bad/strange performance on a new cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

we have a new Ceph cluster and have some very bad/strange performance behavior.

I really don't understand what I'm doing wrong here and would be more than happy if anyone has an idea.
Even a hint on what to look at would be helpful.

Some Information:

Machines (8 Nodes) per Node:

- CPU 2x Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz (28 Cores)
- 384 GB RAM
- 20x Dell Ent NVMe AGN RI U.2 7.68TB (for OSDs)
- 4x 25G LACP Backend
- 2x 25G LACP Frontend

- OS:
    - Ubuntu 22.04
    - Kernel: 5.15.0
- Ceph:
    - Version 18.2.4
    - 160 osds
    - 4096 PGs for the VM pool


I took some fio benchmarks from the Proxmox Ceph Performance Paper:
https://www.proxmox.com/images/download/pve/docs/Proxmox-VE_Ceph-Benchmark-202009-rev2.pdf

The First test should have about 1500 IOPS (Proxmox Paper: 1806).
We only get 170.

root@ceph001:/mnt# fio --ioengine=psync --filename=test_fio --size=9G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
fio: Laying out IO file (1 file / 9216MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=680KiB/s][w=170 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=174797: Wed Jan 22 20:19:19 2025
  write: IOPS=202, BW=811KiB/s (831kB/s)(47.5MiB/60003msec); 0 zone resets
    clat (usec): min=2185, max=20081, avg=4925.43, stdev=931.63
     lat (usec): min=2186, max=20082, avg=4926.19, stdev=931.63
    clat percentiles (usec):
     |  1.00th=[ 3425],  5.00th=[ 3818], 10.00th=[ 3982], 20.00th=[ 4293],
     | 30.00th=[ 4490], 40.00th=[ 4686], 50.00th=[ 4817], 60.00th=[ 5014],
     | 70.00th=[ 5211], 80.00th=[ 5407], 90.00th=[ 5800], 95.00th=[ 6063],
     | 99.00th=[ 8586], 99.50th=[ 9503], 99.90th=[12256], 99.95th=[13304],
     | 99.99th=[19006]
bw ( KiB/s): min= 672, max= 1000, per=100.00%, avg=813.11, stdev=73.18, samples=119
   iops        : min=  168, max=  250, avg=203.28, stdev=18.29, samples=119
  lat (msec)   : 4=10.24%, 10=89.43%, 20=0.32%, 50=0.01%
  cpu          : usr=0.25%, sys=2.57%, ctx=36503, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,12167,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=811KiB/s (831kB/s), 811KiB/s-811KiB/s (831kB/s-831kB/s), io=47.5MiB (49.8MB), run=60003-60003msec

Disk stats (read/write):
  rbd0: ios=0/24296, merge=0/2, ticks=0/56351, in_queue=56351, util=99.97%



Bandwith and IOPs with more IO depth look ok form me:

fio --filename=/mnt/testingfio1 --size=50GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=150 --numjobs=1 --time_based \
--group_reporting --name=iops-test-job --eta-newline=1

iops-test-job: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
fio-3.28
iops-test-job: (groupid=0, jobs=1): err= 0: pid=146931: Wed Jan 22 19:43:14 2025
  read: IOPS=20.0k, BW=78.0MiB/s (81.8MB/s)(11.4GiB/150006msec)
    slat (nsec): min=1245, max=7415.7k, avg=22636.97, stdev=224525.03

    clat (usec): min=238, max=32714, avg=5620.53, stdev=2255.97
     lat (usec): min=243, max=32721, avg=5643.32, stdev=2258.28
    clat percentiles (usec):
     |  1.00th=[ 1876],  5.00th=[ 2311], 10.00th=[ 2671], 20.00th=[ 3654],
     | 30.00th=[ 4146], 40.00th=[ 4752], 50.00th=[ 5342], 60.00th=[ 6128],
     | 70.00th=[ 6915], 80.00th=[ 7635], 90.00th=[ 8717], 95.00th=[ 9896],
     | 99.00th=[10683], 99.50th=[10945], 99.90th=[11863], 99.95th=[12649],
     | 99.99th=[14615]
bw ( KiB/s): min=63254, max=98432, per=100.00%, avg=79914.04, stdev=6290.69, samples=299 iops : min=15813, max=24608, avg=19978.36, stdev=1572.70, samples=299 write: IOPS=19.9k, BW=77.9MiB/s (81.7MB/s)(11.4GiB/150006msec); 0 zone resets
    slat (nsec): min=1349, max=8871.7k, avg=23250.80, stdev=225370.53
    clat (usec): min=629, max=81108, avg=7160.58, stdev=2338.33
     lat (usec): min=633, max=81114, avg=7183.98, stdev=2348.98
    clat percentiles (usec):
     |  1.00th=[ 2900],  5.00th=[ 3982], 10.00th=[ 4293], 20.00th=[ 5014],
     | 30.00th=[ 5735], 40.00th=[ 6325], 50.00th=[ 6980], 60.00th=[ 7570],
     | 70.00th=[ 8225], 80.00th=[ 9110], 90.00th=[10421], 95.00th=[11207],
     | 99.00th=[13435], 99.50th=[14353], 99.90th=[16581], 99.95th=[17957],
     | 99.99th=[21365]
bw ( KiB/s): min=61755, max=98813, per=100.00%, avg=79877.64, stdev=6336.82, samples=299 iops : min=15438, max=24703, avg=19969.22, stdev=1584.22, samples=299
  lat (usec)   : 250=0.01%, 500=0.03%, 750=0.07%, 1000=0.08%
    clat (usec): min=238, max=32714, avg=5620.53, stdev=2255.97
                                    [141/1761]
     lat (usec): min=243, max=32721, avg=5643.32, stdev=2258.28

    clat percentiles (usec):

     |  1.00th=[ 1876],  5.00th=[ 2311], 10.00th=[ 2671], 20.00th=[ 3654],
     | 30.00th=[ 4146], 40.00th=[ 4752], 50.00th=[ 5342], 60.00th=[ 6128],
     | 70.00th=[ 6915], 80.00th=[ 7635], 90.00th=[ 8717], 95.00th=[ 9896],
     | 99.00th=[10683], 99.50th=[10945], 99.90th=[11863], 99.95th=[12649],
     | 99.99th=[14615]

bw ( KiB/s): min=63254, max=98432, per=100.00%, avg=79914.04, stdev=6290.69, samples=299 iops : min=15813, max=24608, avg=19978.36, stdev=1572.70, samples=299 write: IOPS=19.9k, BW=77.9MiB/s (81.7MB/s)(11.4GiB/150006msec); 0 zone resets
    slat (nsec): min=1349, max=8871.7k, avg=23250.80, stdev=225370.53

    clat (usec): min=629, max=81108, avg=7160.58, stdev=2338.33

     lat (usec): min=633, max=81114, avg=7183.98, stdev=2348.98

    clat percentiles (usec):

     |  1.00th=[ 2900],  5.00th=[ 3982], 10.00th=[ 4293], 20.00th=[ 5014],
     | 30.00th=[ 5735], 40.00th=[ 6325], 50.00th=[ 6980], 60.00th=[ 7570],
     | 70.00th=[ 8225], 80.00th=[ 9110], 90.00th=[10421], 95.00th=[11207],
     | 99.00th=[13435], 99.50th=[14353], 99.90th=[16581], 99.95th=[17957],
     | 99.99th=[21365]

bw ( KiB/s): min=61755, max=98813, per=100.00%, avg=79877.64, stdev=6336.82, samples=299 iops : min=15438, max=24703, avg=19969.22, stdev=1584.22, samples=299
  lat (usec)   : 250=0.01%, 500=0.03%, 750=0.07%, 1000=0.08%

  lat (msec)   : 2=0.65%, 4=15.15%, 10=75.45%, 20=8.55%, 50=0.01%

  lat (msec)   : 100=0.01%

  cpu          : usr=7.36%, sys=18.55%, ctx=155494, majf=0, minf=9200

IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=2993949,2992429,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
READ: bw=78.0MiB/s (81.8MB/s), 78.0MiB/s-78.0MiB/s (81.8MB/s-81.8MB/s), io=11.4GiB (12.3GB), run=150006-150006msec WRITE: bw=77.9MiB/s (81.7MB/s), 77.9MiB/s-77.9MiB/s (81.7MB/s-81.7MB/s), io=11.4GiB (12.3GB), run=150006-150006msec

Disk stats (read/write):
rbd0: ios=2989470/2987987, merge=0/1, ticks=9760844/13043854, in_queue=22804699, util=100.00%

We have 4096 PG on the tested pool.

root@ceph001:/mnt# ceph -s
  cluster:
    id:
    health: HEALTH_OK

  services:
mon: 5 daemons, quorum ceph001,ceph002,ceph003,ceph005,ceph006 (age 52m)
    mgr: ceph002.hgppdu(active, since 2d), standbys: ceph001.ooznoq
    osd: 160 osds: 160 up (since 5w), 160 in (since 5M)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    pools:   11 pools, 8449 pgs
    objects: 7.80M objects, 19 TiB
    usage:   56 TiB used, 1.0 PiB / 1.1 PiB avail


root@ceph001:~# ceph config get osd osd_memory_target
4294967296

root@ceph001:~# ceph config get osd
WHO MASK LEVEL OPTION VALUE ...
osd           advanced  osd_memory_target_autotune  true
...

We would like to use the cluster with OpenStack Cinder. But the tests were made directly on the cluster nodes with rbd. The metrics on VMs are similar.


Thanks in advance.

Jan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux