Hi Mark,
Thanks a lot for highlighting this issue...I have 2 questions:
1) In the patch comments:
/"but we fail to populate this setting down when building external
projects. this is important when it comes to the projects which is
critical to the performance. RocksDB is one of them."/
Do we have similar issues with other sub-projects ? boost ? spdk .. ?
2) The chart shown on "rocksdb submit latency", going from over 10 ms to
below 5 ms..is this during write i/o under heavy load ?
/Maged
On 08/02/2024 20:04, Mark Nelson wrote:
Hi Folks,
Recently we discovered a flaw in how the upstream Ubuntu and Debian
builds of Ceph compile RocksDB. It causes a variety of performance
issues including slower than expected write performance, 3X longer
compaction times, and significantly higher than expected CPU
utilization when RocksDB is heavily utilized. The issue has now been
fixed in main. Igor Fedotov, however, observed during the performance
meeting today that there were no backports for the fix in place. He
also rightly pointed out that it would be helpful to make an
announcement about the issue given the severity for the affected
users. I wanted to give a bit more background and make sure people are
aware and understand what's going on.
1) Who's affected?
Anyone running an upstream Ubuntu/Debian build of Ceph from the last
several years. External builds from Canonical and Gentoo suffered
from this issue as well, but were fixed independently.
2) How can you check?
There's no easy way to tell at the moment. We are investigating if
running "strings" on the OSD executable may provide a clue. For now,
assume that if you are using our Debian/Ubuntu builds in a
non-container configuration you are affected. Proxmox for instance
was affected prior to adopting the fix.
3) Are Cephadm deployments affected?
Not as far as we know. Ceph container builds are compiled slightly
differently from stand-alone Debian builds. They do not appear to
suffer from the bug.
4) What versions of Ceph will get the fix?
Casey Bodley kindly offered to backport the fix to both Reef and
Quincy. He also verified that the fix builds properly with Pacific.
We now have 3 separate backport PRs for the releases here:
https://github.com/ceph/ceph/pull/55500
https://github.com/ceph/ceph/pull/55501
https://github.com/ceph/ceph/pull/55502
Please feel free to reply if you have any questions!
Thanks,
Mark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx