Hi all,
we have a new Ceph cluster and have some very bad/strange performance
behavior.
I really don't understand what I'm doing wrong here and would be more
than happy if anyone has an idea.
Even a hint on what to look at would be helpful.
Some Information:
Machines (8 Nodes) per Node:
- CPU 2x Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz (28 Cores)
- 384 GB RAM
- 20x Dell Ent NVMe AGN RI U.2 7.68TB (for OSDs)
- 4x 25G LACP Backend
- 2x 25G LACP Frontend
- OS:
- Ubuntu 22.04
- Kernel: 5.15.0
- Ceph:
- Version 18.2.4
- 160 osds
- 4096 PGs for the VM pool
I took some fio benchmarks from the Proxmox Ceph Performance Paper:
https://www.proxmox.com/images/download/pve/docs/Proxmox-VE_Ceph-Benchmark-202009-rev2.pdf
The First test should have about 1500 IOPS (Proxmox Paper: 1806).
We only get 170.
root@ceph001:/mnt# fio --ioengine=psync --filename=test_fio --size=9G
--time_based --name=fio --group_reporting --runtime=60 --direct=1
--sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
fio: Laying out IO file (1 file / 9216MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=680KiB/s][w=170 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=174797: Wed Jan 22 20:19:19 2025
write: IOPS=202, BW=811KiB/s (831kB/s)(47.5MiB/60003msec); 0 zone resets
clat (usec): min=2185, max=20081, avg=4925.43, stdev=931.63
lat (usec): min=2186, max=20082, avg=4926.19, stdev=931.63
clat percentiles (usec):
| 1.00th=[ 3425], 5.00th=[ 3818], 10.00th=[ 3982], 20.00th=[ 4293],
| 30.00th=[ 4490], 40.00th=[ 4686], 50.00th=[ 4817], 60.00th=[ 5014],
| 70.00th=[ 5211], 80.00th=[ 5407], 90.00th=[ 5800], 95.00th=[ 6063],
| 99.00th=[ 8586], 99.50th=[ 9503], 99.90th=[12256], 99.95th=[13304],
| 99.99th=[19006]
bw ( KiB/s): min= 672, max= 1000, per=100.00%, avg=813.11,
stdev=73.18, samples=119
iops : min= 168, max= 250, avg=203.28, stdev=18.29, samples=119
lat (msec) : 4=10.24%, 10=89.43%, 20=0.32%, 50=0.01%
cpu : usr=0.25%, sys=2.57%, ctx=36503, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=0,12167,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=811KiB/s (831kB/s), 811KiB/s-811KiB/s (831kB/s-831kB/s),
io=47.5MiB (49.8MB), run=60003-60003msec
Disk stats (read/write):
rbd0: ios=0/24296, merge=0/2, ticks=0/56351, in_queue=56351, util=99.97%
Bandwith and IOPs with more IO depth look ok form me:
fio --filename=/mnt/testingfio1 --size=50GB --direct=1 --rw=randrw
--bs=4k --ioengine=libaio --iodepth=256 --runtime=150 --numjobs=1
--time_based \
--group_reporting --name=iops-test-job --eta-newline=1
iops-test-job: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B,
(T) 4096B-4096B, ioengine=libaio, iodepth=256
fio-3.28
iops-test-job: (groupid=0, jobs=1): err= 0: pid=146931: Wed Jan 22
19:43:14 2025
read: IOPS=20.0k, BW=78.0MiB/s (81.8MB/s)(11.4GiB/150006msec)
slat (nsec): min=1245, max=7415.7k, avg=22636.97, stdev=224525.03
clat (usec): min=238, max=32714, avg=5620.53, stdev=2255.97
lat (usec): min=243, max=32721, avg=5643.32, stdev=2258.28
clat percentiles (usec):
| 1.00th=[ 1876], 5.00th=[ 2311], 10.00th=[ 2671], 20.00th=[ 3654],
| 30.00th=[ 4146], 40.00th=[ 4752], 50.00th=[ 5342], 60.00th=[ 6128],
| 70.00th=[ 6915], 80.00th=[ 7635], 90.00th=[ 8717], 95.00th=[ 9896],
| 99.00th=[10683], 99.50th=[10945], 99.90th=[11863], 99.95th=[12649],
| 99.99th=[14615]
bw ( KiB/s): min=63254, max=98432, per=100.00%, avg=79914.04,
stdev=6290.69, samples=299
iops : min=15813, max=24608, avg=19978.36, stdev=1572.70,
samples=299
write: IOPS=19.9k, BW=77.9MiB/s (81.7MB/s)(11.4GiB/150006msec); 0
zone resets
slat (nsec): min=1349, max=8871.7k, avg=23250.80, stdev=225370.53
clat (usec): min=629, max=81108, avg=7160.58, stdev=2338.33
lat (usec): min=633, max=81114, avg=7183.98, stdev=2348.98
clat percentiles (usec):
| 1.00th=[ 2900], 5.00th=[ 3982], 10.00th=[ 4293], 20.00th=[ 5014],
| 30.00th=[ 5735], 40.00th=[ 6325], 50.00th=[ 6980], 60.00th=[ 7570],
| 70.00th=[ 8225], 80.00th=[ 9110], 90.00th=[10421], 95.00th=[11207],
| 99.00th=[13435], 99.50th=[14353], 99.90th=[16581], 99.95th=[17957],
| 99.99th=[21365]
bw ( KiB/s): min=61755, max=98813, per=100.00%, avg=79877.64,
stdev=6336.82, samples=299
iops : min=15438, max=24703, avg=19969.22, stdev=1584.22,
samples=299
lat (usec) : 250=0.01%, 500=0.03%, 750=0.07%, 1000=0.08%
clat (usec): min=238, max=32714, avg=5620.53, stdev=2255.97
[141/1761]
lat (usec): min=243, max=32721, avg=5643.32, stdev=2258.28
clat percentiles (usec):
| 1.00th=[ 1876], 5.00th=[ 2311], 10.00th=[ 2671], 20.00th=[ 3654],
| 30.00th=[ 4146], 40.00th=[ 4752], 50.00th=[ 5342], 60.00th=[ 6128],
| 70.00th=[ 6915], 80.00th=[ 7635], 90.00th=[ 8717], 95.00th=[ 9896],
| 99.00th=[10683], 99.50th=[10945], 99.90th=[11863], 99.95th=[12649],
| 99.99th=[14615]
bw ( KiB/s): min=63254, max=98432, per=100.00%, avg=79914.04,
stdev=6290.69, samples=299
iops : min=15813, max=24608, avg=19978.36, stdev=1572.70,
samples=299
write: IOPS=19.9k, BW=77.9MiB/s (81.7MB/s)(11.4GiB/150006msec); 0
zone resets
slat (nsec): min=1349, max=8871.7k, avg=23250.80, stdev=225370.53
clat (usec): min=629, max=81108, avg=7160.58, stdev=2338.33
lat (usec): min=633, max=81114, avg=7183.98, stdev=2348.98
clat percentiles (usec):
| 1.00th=[ 2900], 5.00th=[ 3982], 10.00th=[ 4293], 20.00th=[ 5014],
| 30.00th=[ 5735], 40.00th=[ 6325], 50.00th=[ 6980], 60.00th=[ 7570],
| 70.00th=[ 8225], 80.00th=[ 9110], 90.00th=[10421], 95.00th=[11207],
| 99.00th=[13435], 99.50th=[14353], 99.90th=[16581], 99.95th=[17957],
| 99.99th=[21365]
bw ( KiB/s): min=61755, max=98813, per=100.00%, avg=79877.64,
stdev=6336.82, samples=299
iops : min=15438, max=24703, avg=19969.22, stdev=1584.22,
samples=299
lat (usec) : 250=0.01%, 500=0.03%, 750=0.07%, 1000=0.08%
lat (msec) : 2=0.65%, 4=15.15%, 10=75.45%, 20=8.55%, 50=0.01%
lat (msec) : 100=0.01%
cpu : usr=7.36%, sys=18.55%, ctx=155494, majf=0, minf=9200
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.1%
issued rwts: total=2993949,2992429,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=78.0MiB/s (81.8MB/s), 78.0MiB/s-78.0MiB/s
(81.8MB/s-81.8MB/s), io=11.4GiB (12.3GB), run=150006-150006msec
WRITE: bw=77.9MiB/s (81.7MB/s), 77.9MiB/s-77.9MiB/s
(81.7MB/s-81.7MB/s), io=11.4GiB (12.3GB), run=150006-150006msec
Disk stats (read/write):
rbd0: ios=2989470/2987987, merge=0/1, ticks=9760844/13043854,
in_queue=22804699, util=100.00%
We have 4096 PG on the tested pool.
root@ceph001:/mnt# ceph -s
cluster:
id:
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph001,ceph002,ceph003,ceph005,ceph006 (age
52m)
mgr: ceph002.hgppdu(active, since 2d), standbys: ceph001.ooznoq
osd: 160 osds: 160 up (since 5w), 160 in (since 5M)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 11 pools, 8449 pgs
objects: 7.80M objects, 19 TiB
usage: 56 TiB used, 1.0 PiB / 1.1 PiB avail
root@ceph001:~# ceph config get osd osd_memory_target
4294967296
root@ceph001:~# ceph config get osd
WHO MASK LEVEL OPTION VALUE
...
osd advanced osd_memory_target_autotune true
...
We would like to use the cluster with OpenStack Cinder. But the tests
were made directly on the cluster nodes with rbd. The metrics on VMs are
similar.
Thanks in advance.
Jan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx