Re: Degradation of write-performance after upgrading to Octopus

David Orman <ormandj@xxxxxxxxxxxx> · Thu, 4 Jun 2020 09:29:09 -0500

   * bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293

On Thu, Jun 4, 2020 at 8:07 AM Thomas Gradisnik <tg@xxxxxxxxx> wrote:

> We have deployed a small test cluster consisting of three nodes. Each node
> is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two
> partitions), so six osds in total. We started with Ceph 14.2.7 some weeks
> ago (upgraded to 14.2.9 later) and ran different tests using fio against
> some rbd volumes in order to get an overview what performance we could
> expect. The configuration is unchanged compared to the defaults, we only
> set several debugging options to 0/0.
>
> Yesterday we upgraded the whole cluster following the upgrade guidelines
> to Ceph 15.2.3, which worked without any problems so far. Nevertheless when
> running the same tests as before with Ceph 14.2.9, we are seeing some clear
> degradations in write-performance (beside some performance improvements,
> which shall also be mentioned).
>
> Here the results of concern (each with the relevant fio settings used):
>
> Test "read-latency-max"
> (rw=randread, iodepth=64, bs=4k)
> read_iops: 32500 -> 87000
>
> Test "write-latency-max"
> (rw=randwrite, iodepth=64, bs=4k)
> write_iops: 22500 -> 11500
>
> Test "write-throughput-iops-max"
> (rw=write, iodepth=64, bs=4k)
> write_iops: 7000 -> 14000
>
> Test "usecase1"
> (rw=randrw,
> bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/,
> rwmixread=1, rate_process=poisson, iodepth=64)
> write_iops: 21000 -> 8500
>
> Test "usecase1-readonly"
> (rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,
> rate_process=poisson, iodepth=64)
> read_iops: 28000 -> 58000
>
> The last two tests represent a typical use case on our systems. Therefore
> we are especially concerned by the drop in performance from 21000 w/ops to
> 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3.
>
> We ran all tests several times, the values are averaged over all
> iterations and fairly consistent and reproducible. We even tried wiping the
> whole cluster, downgrading to Ceph 14.2.9 again, setting up a new
> cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The
> tests have been performed on one of the three cluster nodes using a 50G rbd
> volume, which had been prefilled with random data before each test-run.
>
> Have any changes been introduced with Octopus that could explain the
> observed changes in performance?
>
> What we already tried:
>
> - Disabling rbd cache
> - Reverting rbc cache policy to writeback (default in 14.2)
> - Setting rbd io scheduler to none
> - Deploying a fresh cluster starting with Ceph 15.2.3
>
> Kernel is 5.4.38 … I don't know if some other system specs would be
> helpful besides the already mentioned (since we are talking about a
> relative change in performance after upgrading Ceph without any further
> changes) - if so, please let us know.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx