Hi Dongdong,
thanks a lot for your post, it's really helpful.
Thanks,
Igor
On 1/5/2023 6:12 AM, Dongdong Tao wrote:
I see many users recently reporting that they have been struggling
with this Onode::put race condition issue[1] on both the latest
Octopus and pacific.
Igor opened a PR [2] to address this issue, I've reviewed it
carefully, and looks good to me. I'm hoping this could get some
priority from the community.
For those who had been hitting this issue, I would like to share a
workaround that could unblock you:
During the investigation of this issue, I found this race condition
always happens after the bluestore onode cache size becomes 0.
Setting debug_bluestore = 1/30 will allow you to see the cache size
after the crash:
---
2022-10-25T00:47:26.562+0000 7f424f78e700 30
bluestore.MempoolThread(0x564a9dae2a68) _resize_shards
max_shard_onodes: 0 max_shard_buffer: 8388608
---
This is apparently wrong as this means the bluestore metadata cache is
basically disabled,
but it makes much sense to explain why we are hitting the race
condition so easily -- An onode will be trimmed right away after it's
unpinned.
Keep going with the investigation, it turns out the culprit for the
0-sized cache is the leak that happened in bluestore_cache_other mempool
Please refer to the bug tracker [3] which has the detail of the leak
issue, it was already fixed by [4], and the next Pacific point
release will have it.
But it was never backported to Octopus.
So if you are hitting the same:
For those who are on Octopus, you can manually backport this patch to
fix the leak and prevent the race condition from happening.
For those who are on Pacific, you can wait for the next Pacific point
release.
By the way, I'm backporting the fix to ubuntu Octopus and Pacific
through this SRU [5], so it will be landed in ubuntu's package soon.
[1] https://tracker.ceph.com/issues/56382
[2] https://github.com/ceph/ceph/pull/47702
[3] https://tracker.ceph.com/issues/56424
[4] https://github.com/ceph/ceph/pull/46911
[5] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1996010
Cheers,
Dongdong
--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
Twitter <https://twitter.com/croit_io>
Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22>
Technology Fast50 Award Winner by Deloitte
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>!
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx