Hi Mark,
Thanks a lot for highlighting this issue...I have 2 questions:
1) In the patch comments:
"but we fail to populate this setting down when building external projects. this is important when it comes to the projects which is critical to the performance. RocksDB is one of them."
Do we have similar issues with other sub-projects ? boost ? spdk .. ?
2) The chart shown on "rocksdb submit latency", going from over
10 ms to below 5 ms..is this during write i/o under heavy load ?
/Maged
On 08/02/2024 20:04, Mark Nelson wrote:
Hi Folks,
Recently we discovered a flaw in how the upstream Ubuntu and Debian builds of Ceph compile RocksDB. It causes a variety of performance issues including slower than expected write performance, 3X longer compaction times, and significantly higher than expected CPU utilization when RocksDB is heavily utilized. The issue has now been fixed in main. Igor Fedotov, however, observed during the performance meeting today that there were no backports for the fix in place. He also rightly pointed out that it would be helpful to make an announcement about the issue given the severity for the affected users. I wanted to give a bit more background and make sure people are aware and understand what's going on.
1) Who's affected?
Anyone running an upstream Ubuntu/Debian build of Ceph from the last several years. External builds from Canonical and Gentoo suffered from this issue as well, but were fixed independently.
2) How can you check?
There's no easy way to tell at the moment. We are investigating if running "strings" on the OSD executable may provide a clue. For now, assume that if you are using our Debian/Ubuntu builds in a non-container configuration you are affected. Proxmox for instance was affected prior to adopting the fix.
3) Are Cephadm deployments affected?
Not as far as we know. Ceph container builds are compiled slightly differently from stand-alone Debian builds. They do not appear to suffer from the bug.
4) What versions of Ceph will get the fix?
Casey Bodley kindly offered to backport the fix to both Reef and Quincy. He also verified that the fix builds properly with Pacific. We now have 3 separate backport PRs for the releases here:
https://github.com/ceph/ceph/pull/55500
https://github.com/ceph/ceph/pull/55501
https://github.com/ceph/ceph/pull/55502
Please feel free to reply if you have any questions!
Thanks,
Mark
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx