Re: 4k IOPS: miserable performance in All-SSD cluster

Peter Linder <peter.linder@xxxxxxxxxxxxxx> · Tue, 26 Nov 2024 22:30:13 +0100

That is indeed a lot nicer hardware and 1804 iops is faster, but still 
lower than a usd thumb drive.

The thing with ceph is that is scales out really really well, but 
scaling up is harder. That is, if you run like 500 of these tests at the 
same time, then you can see what it can do.

Some guy did 1TB per second: 
https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/

Den 2024-11-26 kl. 22:06, skrev Martin Gerhard Loschwitz:
Here’s a benchmark of another setup I did a few months back, with NVME 
flash drives and a Mellanox EVPN fabric (Spectrum ASIC) between the 
nodes (no RDMA). 3 hosts and 24 drives in total.

root@test01:~# fio --ioengine=libaio --filename=/dev/sdb --direct=1 
--sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 
--time_based --name=fio
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=6966KiB/s][w=1741 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=115698: Tue May 28 16:54:38 2024
  write: IOPS=1804, BW=7218KiB/s (7391kB/s)(423MiB/60001msec); 0 zone 
resets
    slat (nsec): min=2872, max=92926, avg=5026.65, stdev=2710.03
    clat (usec): min=419, max=4486, avg=548.34, stdev=54.66
     lat (usec): min=461, max=4490, avg=553.37, stdev=55.02
    clat percentiles (usec):
     |  1.00th=[  486],  5.00th=[  502], 10.00th=[  510], 20.00th=[  523],
     | 30.00th=[  529], 40.00th=[  537], 50.00th=[  545], 60.00th=[  553],
     | 70.00th=[  562], 80.00th=[  570], 90.00th=[  586], 95.00th=[  594],
     | 99.00th=[  660], 99.50th=[  758], 99.90th=[ 1156], 99.95th=[ 1287],
     | 99.99th=[ 2606]
   bw (  KiB/s): min= 6664, max= 8072, per=100.00%, avg=7225.95, 
stdev=268.19, samples=119
   iops      : min= 1666, max= 2018, avg=1806.49, stdev=67.05, samples=119
  lat (usec)   : 500=4.95%, 750=94.52%, 1000=0.38%
  lat (msec)   : 2=0.13%, 4=0.02%, 10=0.01%
  cpu      : usr=0.57%, sys=1.46%, ctx=108317, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwts: total=0,108275,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=7218KiB/s (7391kB/s), 7218KiB/s-7218KiB/s 
(7391kB/s-7391kB/s), io=423MiB (443MB), run=60001-60001msec

Disk stats (read/write):
  sdb: ios=80/108093, merge=0/0, ticks=21/59172, in_queue=59193, 
util=99.96%

This was in an instance inside VMware, so there was iSCSI involved in 
the data path in addition to the normal Ceph replication, with Ceph 
being mostly out-of-the-box and standard.

I wouldn’t believe 40 (or 400 in the SSD cluster) would be a bad value 
had I not seen substantially better values in the past. And even the 
1000 would be a very substantial improvement compared to what I see now.

Best regards
Martin

--
True West IT Services GmbH 	
Martin Gerhard Loschwitz
Geschäftsführer / CEO, True West IT Services GmbH
P +49 2433 5253130 <tel:+49 2433 5253130>
M +49 176 61832178 <https://mysig.io/4ngY23j0>
ASchmiedegasse 24a, 41836 Hückelhoven, Deutschland
R HRB 21985, Amtsgericht Mönchengladbach <https://mysig.io/b4g0y3rz>

<https://mysignature.io/editor?utm_source=expiredpixel>

True West IT Services GmbH is compliant with the GDPR regulation on 
data protection and privacy in the European Union and the European 
Economic Area. You can request the information on how we collect and 
process your private data according to the law by contacting the email 
sender.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx