On 07/09/2018 14:28, Sage Weil wrote:
On Fri, 7 Sep 2018, Xiangyang Yu wrote:
Hi all,
In our production cluster, we use jewel 10.2.10. We use jemalloc to
allocate memory.
These days we are trying to add rocksdb support to osd and monitor,
Be aware that there is a known problem with rocksdb and jemalloc that
causes a crash; see http://tracker.ceph.com/issues/20557
That appears to be a different issue than the compilation problem you are
seeing. Assuming you get past that, though, I would expect you to hit the
#20557 bug anyway.
Starting with luminous we've recommended users stop using jemalloc because
the switch to AsyncMessenger wipes away the benefit users were seeing in
jewel; tcmalloc and jemalloc now perform about the same (when jemalloc
isn't crashing :).
'mmmm,
Made me start to wonder why the FreeBSD version had not been bitten by
this? Since jemalloc is the default malloc there.
But everything is linked against libtcmalloc.so, which could/should
prevent that.
Is this codepath typical for Bluestore, and not for filestore??
Since that would be the other explanation.
But then again I submitted some fixes to rocksdb to get the handling of
jemalloc being in libc including/linking the right way. So could be that
malloc in rocksdb then started using the libc malloc (aka the jemalloc
variant)
Do we know what the bug in the rocksdb/jemalloc combination is? Or was
it solved by going to tcmalloc, without in depth understanding wat was
going on?
Putting it on the list to figure out.
--WjW
sage
I have merged the commit below but failed to compile the code,
https://github.com/ceph/ceph/pull/18010
https://github.com/ceph/ceph/pull/18010
The screen shows :
src/rocksdb/db/db_impl.cc:401 undefined reference to 'malloc_stats_print'
Make[3] : [ceph_test_keyvaluedb] ERROR 1
src/rocksdb/db/db_impl.cc:401 undefined reference to 'malloc_stats_print'
Make[3] : [ceph_osdmap_tool] ERROR 1
src/rocksdb/db/db_impl.cc:401 undefined reference to 'malloc_stats_print'
Make[3] : [ceph_kvstore_tool] ERROR 1
Then I find some related commit which merged in Lumious:
cmake: should link against ${ALLOC_LIBS}
https://github.com/ceph/ceph/pull/11978/files
But this commit did not resolve my problem, there errors still exists.
But when I compile 10.2.11 , no errors show, it's very surprising. ALL
makefile seems the same.
I have spended one day to solve the problem with no outcome.
I must miss some commits, Anyone has some clues?
Best wished,
brandy