Re: avg apply latency went up after update from octopus to pacific

Mark Nelson <mark.a.nelson@xxxxxxxxx> · Tue, 28 Feb 2023 17:16:13 -0600

One thing to watch out for with bluefs_buffered_io is that disabling it 
can greatly impact certain rocksdb workloads.  From what I remember it 
was a huge problem during certain iteration workloads for things like 
collection listing.  I think the block cache was being invalidated or 
simply never cached the data properly, but the underlying code has 
changed quite a bit so it was tough to track down exactly how it worked 
in different versions of RocksDB.

We got bitten by this a couple years ago when we switch to direct IO and 
it caused a lot of people trouble.  We ended up having to turn it back 
on after lots of frustration and digging  Basically the linux page cache 
is saving the day even though it really shouldn't be necessary.  It's 
irritating because bluestore is otherwise faster in many scenarios (at 
least with NVMe drives) when bluefs uses direct IO.

Mark

On 2/28/23 15:46, Boris Behrens wrote:
Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.

Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen <
jbaergen@xxxxxxxxxxxxxxxx>:

Hi Boris,

OK, what I'm wondering is whether
https://tracker.ceph.com/issues/58530 is involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, and thus the things I mention in the next
paragraph may or may not be applicable to you.

There's no known workaround or solution for this at this time. In some
cases I've seen that disabling bluefs_buffered_io (which itself can
cause IOPS amplification in some cases) can help; I think most folks
do this by setting it in local conf and then restarting OSDs in order
to gain the config change. Something else to consider is

https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
,
as sometimes disabling these write caches can improve the IOPS
performance of SSDs.

Josh

On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens <bb@xxxxxxxxx> wrote:

Hi Josh,
we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.

Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <
jbaergen@xxxxxxxxxxxxxxxx>:

Hi Boris,

Which version did you upgrade from and to, specifically? And what
workload are you running (RBD, etc.)?

Josh

On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens <bb@xxxxxxxxx> wrote:

Hi,
today I did the first update from octopus to pacific, and it looks
like the
avg apply latency went up from 1ms to 2ms.

All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config
value?

Cheers
  Boris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx