Ok, so I just completed the upgrade to 16.2.6 and things are looking much better! Now the average write speed for OSDs (via `ceph tell osd bench`) is around 120MB/s and for a k=4, m=2 erasure coded pool I see over 650MB/s (via `rados bench`). And over an s3 connection it is around 450MB/s. One thing I did notice is that using `ceph tell osd bench` all the OSDs in the local host report speeds of around 150MB/s whill all the OSDs in the other host are around 100MB/s. Maybe this is just a latency issue in the benchmark. Thanks again Igor! Dustin On Mon, Nov 01, 2021 at 04:31:59PM +0000, Dustin Lagoy wrote: Thanks Igor! I'll try to upgrade an report back. Dustin On Mon, Nov 01, 2021 at 07:27:11PM +0300, Igor Fedotov wrote: Then highly likely you're bitten by https://tracker.ceph.com/issues/52089 This has been fixed starting 16.2.6. So please update or wait for a bit till 16.2.7 is release which is gonna to happend shortly. Thanks, Igor On 11/1/2021 7:25 PM, Dustin Lagoy wrote: > I am running a cephadm base cluster with all images on 16.2.5. > > Thanks for the quick response! > Dustin > > > On Mon, Nov 01, 2021 at 07:18:38PM +0300, Igor Fedotov wrote: > Hey Dustin, > > what Pacific version have you got? > > > Thanks, > > Igor > > On 11/1/2021 7:08 PM, Dustin Lagoy wrote: >> Hi everyone, >> >> This is my first time posting here, so it's nice to meet you all! >> >> I have a Ceph cluster that was recently upgraded from Octopus to Pacific and now the write performance is noticeably worse. We have an application which continually writes 270MB/s through the RGW which was working fine before the upgrade. Now it struggles to write 170MB/s continuously. >> >> I don't have detailed benchmarks from before the upgrade other that what I just mentioned but I have investigated some since. For background the cluster is two hosts with 10 HDD OSDs each which is under 5% usage overall. The RGW pool is set up with a k=4, m=2 erasure code (across OSDs). >> >> Profiling with `rados bench` to a pool with the same erasure code setup gives similar 170MB/s performance (so about 42MB/s per disk). Profiling a single OSD using `ceph tell osd.0 bench` yields an average of 40MB/s across all disks (which should yield close to 160MB/s at the pool level). Given this it seems to me to be an issue at the OSD level. >> >> I tried setting `bluefs_buffered_io` to false as mentioned elsewhere. This reduced the `%wrqm` reported by iostat (which was previously close to 100%) and gave a slight performance gain (around 50MB/s per OSD), but nothing close to what was seen previously. >> >> Before and after the above change both iostat and iotop report over 100MB/s written to disk while a single `ceph tell osd bench` command is running and iotop shows a near threefold write amplification. With the benchmark writing 1GB to disk iotop shows both the `bstore_kv_sync` and `bstore_kv_final` threads writing about 1GB each and the threads `rocksdb:high0` and `rocksdb:low0` writing a total of 1GB. So 3GB total for the 1GB benchmark. >> >> Looking at the OSD logs during the benchmark it seems rocksdb is compacting several times. I tried adding sharding to one of the OSDs as mentioned in the documentation (with `ceph-bluestore-tool`) but it didn't seem to make a difference. >> >> Does anyone have any idea what may have caused this performance loss? I am happy to post any more logs/detail if it would help. >> >> Thanks! >> Dustin >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx