* bluestore: common/options.cc: disable bluefs_preextend_wal_files <-- from 15.2.3 changelogs. There was a bug which lead to issues on OSD restart, and I believe this was the attempt at mitigation until a proper bugfix could be put into place. I suspect this might be the cause of the symptoms you're seeing. https://tracker.ceph.com/issues/45613 https://github.com/ceph/ceph/pull/35293 On Thu, Jun 4, 2020 at 8:07 AM Thomas Gradisnik <tg@xxxxxxxxx> wrote: > We have deployed a small test cluster consisting of three nodes. Each node > is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two > partitions), so six osds in total. We started with Ceph 14.2.7 some weeks > ago (upgraded to 14.2.9 later) and ran different tests using fio against > some rbd volumes in order to get an overview what performance we could > expect. The configuration is unchanged compared to the defaults, we only > set several debugging options to 0/0. > > Yesterday we upgraded the whole cluster following the upgrade guidelines > to Ceph 15.2.3, which worked without any problems so far. Nevertheless when > running the same tests as before with Ceph 14.2.9, we are seeing some clear > degradations in write-performance (beside some performance improvements, > which shall also be mentioned). > > Here the results of concern (each with the relevant fio settings used): > > Test "read-latency-max" > (rw=randread, iodepth=64, bs=4k) > read_iops: 32500 -> 87000 > > Test "write-latency-max" > (rw=randwrite, iodepth=64, bs=4k) > write_iops: 22500 -> 11500 > > Test "write-throughput-iops-max" > (rw=write, iodepth=64, bs=4k) > write_iops: 7000 -> 14000 > > Test "usecase1" > (rw=randrw, > bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/, > rwmixread=1, rate_process=poisson, iodepth=64) > write_iops: 21000 -> 8500 > > Test "usecase1-readonly" > (rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/, > rate_process=poisson, iodepth=64) > read_iops: 28000 -> 58000 > > The last two tests represent a typical use case on our systems. Therefore > we are especially concerned by the drop in performance from 21000 w/ops to > 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3. > > We ran all tests several times, the values are averaged over all > iterations and fairly consistent and reproducible. We even tried wiping the > whole cluster, downgrading to Ceph 14.2.9 again, setting up a new > cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The > tests have been performed on one of the three cluster nodes using a 50G rbd > volume, which had been prefilled with random data before each test-run. > > Have any changes been introduced with Octopus that could explain the > observed changes in performance? > > What we already tried: > > - Disabling rbd cache > - Reverting rbc cache policy to writeback (default in 14.2) > - Setting rbd io scheduler to none > - Deploying a fresh cluster starting with Ceph 15.2.3 > > Kernel is 5.4.38 … I don't know if some other system specs would be > helpful besides the already mentioned (since we are talking about a > relative change in performance after upgrading Ceph without any further > changes) - if so, please let us know. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx