Re: performance degredation every 30 seconds

Nathan Fish <lordcirth@xxxxxxxxx> · Tue, 15 Dec 2020 00:12:44 -0500

Perhaps WAL is filling up when iodepth is so high? Is WAL on the same
SSDs? If you double the WAL size, does it change?

On Mon, Dec 14, 2020 at 9:05 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Mon, Dec 14, 2020 at 1:28 PM Philip Brown <pbrown@xxxxxxxxxx> wrote:
> >
> > Our goal is to put up a high performance ceph cluster that can deal with 100 very active clients. So for us, testing with iodepth=256 is actually fairly realistic.
>
> 100 active clients on the same node or just 100 active clients?
>
> > but it does also exhibit the problem with iodepth=32
> >
> > [root@irviscsi03 ~]# fio --filename=/dev/rbd0 --direct=1 --rw=randwrite --bs=4k --ioengine=libaio --iodepth=32 --numjobs=1 --time_based --group_reporting --name=iops-test-job --runtime=120 --eta-newline=1
> > iops-test-job: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
> > fio-3.7
> > Starting 1 process
> > fio: file /dev/rbd0 exceeds 32-bit tausworthe random generator.
> > fio: Switching to tausworthe64. Use the random_generator= option to get rid of this warning.
> > Jobs: 1 (f=1): [w(1)][2.5%][r=0KiB/s,w=20.5MiB/s][r=0,w=5258 IOPS][eta 01m:58s]
> > Jobs: 1 (f=1): [w(1)][4.1%][r=0KiB/s,w=41.1MiB/s][r=0,w=10.5k IOPS][eta 01m:56s]
> > Jobs: 1 (f=1): [w(1)][5.8%][r=0KiB/s,w=45.7MiB/s][r=0,w=11.7k IOPS][eta 01m:54s]
> > Jobs: 1 (f=1): [w(1)][7.4%][r=0KiB/s,w=55.3MiB/s][r=0,w=14.2k IOPS][eta 01m:52s]
> > Jobs: 1 (f=1): [w(1)][9.1%][r=0KiB/s,w=54.4MiB/s][r=0,w=13.9k IOPS][eta 01m:50s]
> > Jobs: 1 (f=1): [w(1)][10.7%][r=0KiB/s,w=53.4MiB/s][r=0,w=13.7k IOPS][eta 01m:48s]
> > Jobs: 1 (f=1): [w(1)][12.4%][r=0KiB/s,w=53.7MiB/s][r=0,w=13.7k IOPS][eta 01m:46s]
> > Jobs: 1 (f=1): [w(1)][14.0%][r=0KiB/s,w=55.7MiB/s][r=0,w=14.3k IOPS][eta 01m:44s]
> > Jobs: 1 (f=1): [w(1)][15.7%][r=0KiB/s,w=54.4MiB/s][r=0,w=13.9k IOPS][eta 01m:42s]
> > Jobs: 1 (f=1): [w(1)][17.4%][r=0KiB/s,w=51.6MiB/s][r=0,w=13.2k IOPS][eta 01m:40s]
> > Jobs: 1 (f=1): [w(1)][19.0%][r=0KiB/s,w=38.1MiB/s][r=0,w=9748 IOPS][eta 01m:38s]
> > Jobs: 1 (f=1): [w(1)][20.7%][r=0KiB/s,w=24.1MiB/s][r=0,w=6158 IOPS][eta 01m:36s]
> > Jobs: 1 (f=1): [w(1)][22.3%][r=0KiB/s,w=12.4MiB/s][r=0,w=3178 IOPS][eta 01m:34s]
> > Jobs: 1 (f=1): [w(1)][24.0%][r=0KiB/s,w=31.5MiB/s][r=0,w=8056 IOPS][eta 01m:32s]
> > Jobs: 1 (f=1): [w(1)][25.6%][r=0KiB/s,w=48.6MiB/s][r=0,w=12.4k IOPS][eta 01m:30s]
> > Jobs: 1 (f=1): [w(1)][27.3%][r=0KiB/s,w=52.2MiB/s][r=0,w=13.4k IOPS][eta 01m:28s]
> > Jobs: 1 (f=1): [w(1)][28.9%][r=0KiB/s,w=54.3MiB/s][r=0,w=13.9k IOPS][eta 01m:26s]
> > Jobs: 1 (f=1): [w(1)][30.6%][r=0KiB/s,w=52.6MiB/s][r=0,w=13.5k IOPS][eta 01m:24s]
> > Jobs: 1 (f=1): [w(1)][32.2%][r=0KiB/s,w=55.1MiB/s][r=0,w=14.1k IOPS][eta 01m:22s]
> > Jobs: 1 (f=1): [w(1)][33.9%][r=0KiB/s,w=34.3MiB/s][r=0,w=8775 IOPS][eta 01m:20s]
> > Jobs: 1 (f=1): [w(1)][35.5%][r=0KiB/s,w=52.5MiB/s][r=0,w=13.4k IOPS][eta 01m:18s]
> > Jobs: 1 (f=1): [w(1)][37.2%][r=0KiB/s,w=52.7MiB/s][r=0,w=13.5k IOPS][eta 01m:16s]
> > Jobs: 1 (f=1): [w(1)][38.8%][r=0KiB/s,w=53.9MiB/s][r=0,w=13.8k IOPS][eta 01m:14s]
>
> Have you tried different kernel versions? Might also be worthwhile
> testing using fio's "rados" engine [1] (vs your rados bench test)
> since it might not have been comparing apples-to-apples given the
> >400MiB/s throughout you listed (i.e. large IOs are handled
> differently than small IOs internally).
>
> >   .. etc.
> >
> >
> > ----- Original Message -----
> > From: "Jason Dillaman" <jdillama@xxxxxxxxxx>
> > To: "Philip Brown" <pbrown@xxxxxxxxxx>
> > Cc: "ceph-users" <ceph-users@xxxxxxx>
> > Sent: Monday, December 14, 2020 10:19:48 AM
> > Subject: Re:  performance degredation every 30 seconds
> >
> > On Mon, Dec 14, 2020 at 12:46 PM Philip Brown <pbrown@xxxxxxxxxx> wrote:
> > >
> > > Further experimentation with fio's -rw flag, setting to rw=read, and rw=randwrite, in addition to the original rw=randrw, indicates that it is tied to writes.
> > >
> > > Possibly some kind of buffer flush delay or cache sync delay when using rbd device, even though fio specified --direct=1   ?
> >
> > It might be worthwhile testing with a more realistic io-depth instead
> > of 256 in case you are hitting weird limits due to an untested corner
> > case? Does the performance still degrade with "--iodepth=16" or
> > "--iodepth=32"?
> >
>
> [1] https://github.com/axboe/fio/blob/master/examples/rados.fio
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx