On Wed, Sep 5, 2018 at 11:59 AM David Zafman <dzafman@xxxxxxxxxx> wrote: > > > Kefu, > > With a vstart.sh cluster the ceph-osdomap-tool is broken. It might be > related to change e406d8eb9e1deb801ecb346169eaaf96adbb4b53 which changed > the locking. > David, i tried to reduce this issue in a ubuntu 16.04 docker with GCC 7.3 and debian sid with GCC 8.2 using up-to-date master and dzafman:wip-23875. none of the 4 combinations crashes. i tested with following steps: $ MDS=0 MGR=1 OSD=3 MON=3 ../src/vstart.sh -X -n --filestore $ bin/init-ceph stop osd.0 $ bin/ceph-osdomap-tool --no-mon-config --omap-path dev/osd0/current/omap --command dump-objects Version: 3 Seq: 1 legacy: false and i also used gdb to launch the executable and set a breakpoint at ceph_osdomap_tool.cc:80 to make sure that this line is executed. and it was. is there any specific setting you are using when building Ceph? my configure is: cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_MGR_DASHBOARD_FRONTEND=OFF -DBOOST_J=8 -DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=ON -DENABLE_GIT_VERSION=OFF -DCMAKE_INSTALL_PREFIX:PATH=$HOME/.local -DWITH_PYTHON3=ON -DMGR_PYTHON_VERSION=3 .. > $ gdb bin/ceph-osdomap-tool > GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1 > Copyright (C) 2016 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from bin/ceph-osdomap-tool...rudone. > n (gdb) run --no-mon-config --omap-path dev/osd0/current/omap --command > dump-objects > Starting program: /src/ceph/build/bin/ceph-osdomap-tool --no-mon-config > --omap-path dev/osd0/current/omap --command dump-objects > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > [New Thread 0x7fffeaea2700 (LWP 33725)] > ceph-osdomap-tool: ../nptl/pthread_mutex_lock.c:81: > __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. > > Thread 1 "ceph-osdomap-to" received signal SIGABRT, Aborted. > 0x00007fffed3eb428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 0x00007fffed3eb428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x00007fffed3ed02a in __GI_abort () at abort.c:89 > #2 0x00007fffed3e3bd7 in __assert_fail_base (fmt=<optimized out>, > assertion=assertion@entry=0x7fffee446015 "mutex->__data.__owner == 0", > file=file@entry=0x7fffee445ff8 "../nptl/pthread_mutex_lock.c", > line=line@entry=81, function=function@entry=0x7fffee446180 > <__PRETTY_FUNCTION__.8623> "__pthread_mutex_lock") at assert.c:92 > #3 0x00007fffed3e3c82 in __GI___assert_fail > (assertion=assertion@entry=0x7fffee446015 "mutex->__data.__owner == 0", > file=file@entry=0x7fffee445ff8 "../nptl/pthread_mutex_lock.c", > line=line@entry=81, function=function@entry=0x7fffee446180 > <__PRETTY_FUNCTION__.8623> "__pthread_mutex_lock") at assert.c:101 > #4 0x00007fffee43cf68 in __GI___pthread_mutex_lock > (mutex=mutex@entry=0x5555567b1620) at ../nptl/pthread_mutex_lock.c:81 > #5 0x00007fffee8eef49 in Mutex::Lock (this=this@entry=0x5555567b15f8, > no_lockdep=no_lockdep@entry=false) at > /home/dzafman/ceph/src/common/Mutex.cc:107 > #6 0x000055555574e661 in Mutex::Locker::Locker (m=..., this=<synthetic > pointer>) at /home/dzafman/ceph/src/common/Mutex.h:116 > #7 ConfigProxy::parse_config_files (flags=0, warnings=<optimized out>, > conf_files=0x0, this=0x5555567ae008) at > /home/dzafman/ceph/src/common/config_proxy.h:199 > #8 global_pre_init (defaults=<optimized out>, args=std::vector of > length 1, capacity 1 = {...}, module_type=<optimized out>, > code_env=code_env@entry=CODE_ENVIRONMENT_UTILITY_NODOUT, > flags=flags@entry=0) at /home/dzafman/ceph/src/global/global_init.cc:114 > #9 0x000055555574eba7 in global_init (defaults=<optimized out>, > args=..., module_type=<optimized out>, > code_env=CODE_ENVIRONMENT_UTILITY_NODOUT, flags=0, data_dir_option=0x0, > run_pre_init=true) at /home/dzafman/ceph/src/global/global_init.cc:176 > #10 0x0000555555643a7d in main (argc=<optimized out>, argv=<optimized > out>) at /home/dzafman/ceph/src/tools/ceph_osdomap_tool.cc:80 > > > commit e406d8eb9e1deb801ecb346169eaaf96adbb4b53 > Author: Kefu Chai <kchai@xxxxxxxxxx> > Date: Sun Jul 15 16:49:59 2018 +0800 > > common/config: promote lock from md_config_t to ConfigProxy > > seastar's ConfigProxy and alien's ConfigProxy follow different > threading > models and expose different methods. the former updates a setting > with 3 > steps: > 1. create a local copy of current setting, and apply the proposed > change > to the copy > 2. populate the updated change with a foreign_ptr<> to all shards > (including itself) > 3. on each shards, call apply_changes() to get the interested observers > updated, please note, apply_changes() should only update the local > observers on current shard. > > while the alien's ConfigProxy do all the job in a single synchronized > call, > but we can split it into a finer-grained steps: > 1. apply the proposed change in-place > 2. apply_changes() to get the interested observers updated. > > so, to reuse the code across these two implementations, for instance, > set_mon_vals() will be implemented in ConfigProxy instead, so we can > have different behavior in different ConfigProxy classes. if we keep > using the existing single-piece md_config_t::set_mon_vals(), we have no > chance to differentiate the apply_changes() for seastar port. but the > alien implementation requires a grand lock protecting set_val() and > apply_changes(), so we have to move the lock from md_config_t up to > ConfigProxy. it's also simpler this way, as we don't need an extra > layer > to have a dummy Mutex for seastar's ConfigProxy. as only the alien's > ConfigProxy requires the lock. > > Signed-off-by: Kefu Chai <kchai@xxxxxxxxxx> > > David -- Regards Kefu Chai