Re: Drop of performance after Nautilus to Pacific upgrade

Luis Domingues <luis.domingues@xxxxxxxxx> · Fri, 10 Sep 2021 10:40:08 +0000

Thanks for your observation.

Indeed, I do not get drop of performance when upgrading from Nautilus to Octopus. But even using Pacific 16.1.0, the performance just goes down, so I guess we run into the same issue somehow.

I do not think just staying in Octopus is a solution, as it will reach EOF eventually. The source of this performance drop is still a mystery to me.

Luis Domingues

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, September 7th, 2021 at 10:51 AM, Martin Mlynář <nextsux@xxxxxxxxx> wrote:

> Hello,
>
> we've noticed similar issue after upgrading our test 3 node cluster from
>
> 15.2.14-1~bpo10+1 to 16.1.0-1~bpo10+1.
>
> quick tests using rados bench:
>
> 16.2.5-1~bpo10+1:
>
> Total time run:         133.28
>
> Total writes made:      576
>
> Write size:             4194304
>
> Object size:            4194304
>
> Bandwidth (MB/sec):     17.2869
>
> Stddev Bandwidth:       34.1485
>
> Max bandwidth (MB/sec): 204
>
> Min bandwidth (MB/sec): 0
>
> Average IOPS:           4
>
> Stddev IOPS:            8.55426
>
> Max IOPS:               51
>
> Min IOPS:               0
>
> Average Latency(s):     3.59873
>
> Stddev Latency(s):      5.99964
>
> Max latency(s):         30.6307
>
> Min latency(s):         0.0865062
>
> after downgrading OSDs:
>
> 15.2.14-1~bpo10+1:
>
> Total time run:         120.135
>
> Total writes made:      16324
>
> Write size:             4194304
>
> Object size:            4194304
>
> Bandwidth (MB/sec):     543.524
>
> Stddev Bandwidth:       21.7548
>
> Max bandwidth (MB/sec): 580
>
> Min bandwidth (MB/sec): 436
>
> Average IOPS:           135
>
> Stddev IOPS:            5.43871
>
> Max IOPS:               145
>
> Min IOPS:               109
>
> Average Latency(s):     0.117646
>
> Stddev Latency(s):      0.0391269
>
> Max latency(s):         0.544229
>
> Min latency(s):         0.0602735
>
> We currently run on this setup:
>
> {
>
>     "mon": {
>
>         "ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a)
>
> pacific (stable)": 2
>
>     },
>
>     "mgr": {
>
>         "ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)
>
> octopus (stable)": 3
>
>     },
>
>     "osd": {
>
>         "ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)
>
> octopus (stable)": 35
>
>     },
>
>     "mds": {},
>
>     "overall": {
>
>         "ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)
>
> octopus (stable)": 38,
>
>         "ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a)
>
> pacific (stable)": 2
>
>     }
>
> }
>
> which solved performance issues. All OSDs were newly created and fully
>
> synced from other nodes when upgrading and downgrading back to 15.2.
>
> Best Regards,
>
> Martin
>
> Dne 05. 09. 21 v 19:45 Luis Domingues napsal(a):
>
> > Hello,
> >
> > I run a test cluster of 3 machines with 24 HDDs each, running bare-metal on CentOS 8. Long story short, I can have a bandwidth of ~ 1'200 MB/s when I do a rados bench, writing objects of 128k, when the cluster is installed with Nautilus.
> >
> > When I upgrade the cluster to Pacific, (using ceph-ansible to deploy and/or upgrade), my performances drop to ~400 MB/s of bandwidth doing the same rados bench.
> >
> > I am kind of clueless on what makes the performance drop so much. Does someone have some ideas where I can dig to find the root of this difference?
> >
> > Thanks,
> >
> > Luis Domingues
>
> Martin Mlynář
>
> ceph-users mailing list -- ceph-users@xxxxxxx
>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx