Degradation of write-performance after upgrading to Octopus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have deployed a small test cluster consisting of three nodes. Each node is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two partitions), so six osds in total. We started with Ceph 14.2.7 some weeks ago (upgraded to 14.2.9 later) and ran different tests using fio against some rbd volumes in order to get an overview what performance we could expect. The configuration is unchanged compared to the defaults, we only set several debugging options to 0/0.

Yesterday we upgraded the whole cluster following the upgrade guidelines to Ceph 15.2.3, which worked without any problems so far. Nevertheless when running the same tests as before with Ceph 14.2.9, we are seeing some clear degradations in write-performance (beside some performance improvements, which shall also be mentioned).

Here the results of concern (each with the relevant fio settings used):

Test "read-latency-max"
(rw=randread, iodepth=64, bs=4k)
read_iops: 32500 -> 87000

Test "write-latency-max"
(rw=randwrite, iodepth=64, bs=4k)
write_iops: 22500 -> 11500

Test "write-throughput-iops-max"
(rw=write, iodepth=64, bs=4k)
write_iops: 7000 -> 14000

Test "usecase1"
(rw=randrw, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/, rwmixread=1, rate_process=poisson, iodepth=64)
write_iops: 21000 -> 8500

Test "usecase1-readonly"
(rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/, rate_process=poisson, iodepth=64)
read_iops: 28000 -> 58000

The last two tests represent a typical use case on our systems. Therefore we are especially concerned by the drop in performance from 21000 w/ops to 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3. 

We ran all tests several times, the values are averaged over all iterations and fairly consistent and reproducible. We even tried wiping the whole cluster, downgrading to Ceph 14.2.9 again, setting up a new cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The tests have been performed on one of the three cluster nodes using a 50G rbd volume, which had been prefilled with random data before each test-run.

Have any changes been introduced with Octopus that could explain the observed changes in performance?

What we already tried:

- Disabling rbd cache
- Reverting rbc cache policy to writeback (default in 14.2)
- Setting rbd io scheduler to none
- Deploying a fresh cluster starting with Ceph 15.2.3

Kernel is 5.4.38 … I don't know if some other system specs would be helpful besides the already mentioned (since we are talking about a relative change in performance after upgrading Ceph without any further changes) - if so, please let us know.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux