Re: crash using ceph-osdomap-tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 5, 2018 at 11:59 AM David Zafman <dzafman@xxxxxxxxxx> wrote:
>
>
> Kefu,
>
> With a vstart.sh cluster the ceph-osdomap-tool is broken.  It might be
> related to change e406d8eb9e1deb801ecb346169eaaf96adbb4b53 which changed
> the locking.
>

David, i tried to reduce this issue in a ubuntu 16.04 docker with GCC
7.3 and debian sid with GCC 8.2 using up-to-date master and
dzafman:wip-23875. none of the 4 combinations crashes. i tested with
following steps:

$ MDS=0 MGR=1 OSD=3 MON=3 ../src/vstart.sh -X -n --filestore
$ bin/init-ceph stop osd.0
$ bin/ceph-osdomap-tool --no-mon-config --omap-path
dev/osd0/current/omap --command dump-objects
Version: 3
Seq: 1
legacy: false

and i also used gdb to launch the executable and set a breakpoint at
ceph_osdomap_tool.cc:80 to make sure that this line is executed. and
it was. is there any specific setting you are using when building
Ceph?

my configure is:
cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_MGR_DASHBOARD_FRONTEND=OFF
-DBOOST_J=8 -DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=ON
-DENABLE_GIT_VERSION=OFF -DCMAKE_INSTALL_PREFIX:PATH=$HOME/.local
-DWITH_PYTHON3=ON -DMGR_PYTHON_VERSION=3 ..


> $ gdb bin/ceph-osdomap-tool
> GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/ceph-osdomap-tool...rudone.
> n (gdb) run  --no-mon-config --omap-path dev/osd0/current/omap --command
> dump-objects
> Starting program: /src/ceph/build/bin/ceph-osdomap-tool --no-mon-config
> --omap-path dev/osd0/current/omap --command dump-objects
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7fffeaea2700 (LWP 33725)]
> ceph-osdomap-tool: ../nptl/pthread_mutex_lock.c:81:
> __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
>
> Thread 1 "ceph-osdomap-to" received signal SIGABRT, Aborted.
> 0x00007fffed3eb428 in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:54
> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x00007fffed3eb428 in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x00007fffed3ed02a in __GI_abort () at abort.c:89
> #2  0x00007fffed3e3bd7 in __assert_fail_base (fmt=<optimized out>,
> assertion=assertion@entry=0x7fffee446015 "mutex->__data.__owner == 0",
> file=file@entry=0x7fffee445ff8 "../nptl/pthread_mutex_lock.c",
> line=line@entry=81, function=function@entry=0x7fffee446180
> <__PRETTY_FUNCTION__.8623> "__pthread_mutex_lock") at assert.c:92
> #3  0x00007fffed3e3c82 in __GI___assert_fail
> (assertion=assertion@entry=0x7fffee446015 "mutex->__data.__owner == 0",
> file=file@entry=0x7fffee445ff8 "../nptl/pthread_mutex_lock.c",
> line=line@entry=81, function=function@entry=0x7fffee446180
> <__PRETTY_FUNCTION__.8623> "__pthread_mutex_lock") at assert.c:101
> #4  0x00007fffee43cf68 in __GI___pthread_mutex_lock
> (mutex=mutex@entry=0x5555567b1620) at ../nptl/pthread_mutex_lock.c:81
> #5  0x00007fffee8eef49 in Mutex::Lock (this=this@entry=0x5555567b15f8,
> no_lockdep=no_lockdep@entry=false) at
> /home/dzafman/ceph/src/common/Mutex.cc:107
> #6  0x000055555574e661 in Mutex::Locker::Locker (m=..., this=<synthetic
> pointer>) at /home/dzafman/ceph/src/common/Mutex.h:116
> #7  ConfigProxy::parse_config_files (flags=0, warnings=<optimized out>,
> conf_files=0x0, this=0x5555567ae008) at
> /home/dzafman/ceph/src/common/config_proxy.h:199
> #8  global_pre_init (defaults=<optimized out>, args=std::vector of
> length 1, capacity 1 = {...}, module_type=<optimized out>,
> code_env=code_env@entry=CODE_ENVIRONMENT_UTILITY_NODOUT,
> flags=flags@entry=0) at /home/dzafman/ceph/src/global/global_init.cc:114
> #9  0x000055555574eba7 in global_init (defaults=<optimized out>,
> args=..., module_type=<optimized out>,
> code_env=CODE_ENVIRONMENT_UTILITY_NODOUT, flags=0, data_dir_option=0x0,
> run_pre_init=true) at /home/dzafman/ceph/src/global/global_init.cc:176
> #10 0x0000555555643a7d in main (argc=<optimized out>, argv=<optimized
> out>) at /home/dzafman/ceph/src/tools/ceph_osdomap_tool.cc:80
>
>
> commit e406d8eb9e1deb801ecb346169eaaf96adbb4b53
> Author: Kefu Chai <kchai@xxxxxxxxxx>
> Date:   Sun Jul 15 16:49:59 2018 +0800
>
>      common/config: promote lock from md_config_t to ConfigProxy
>
>      seastar's ConfigProxy and alien's ConfigProxy follow different
> threading
>      models and expose different methods. the former updates a setting
> with 3
>      steps:
>      1. create a local copy of current setting, and apply the proposed
> change
>         to the copy
>      2. populate the updated change with a foreign_ptr<> to all shards
>         (including itself)
>      3. on each shards, call apply_changes() to get the interested observers
>         updated, please note, apply_changes() should only update the local
>         observers on current shard.
>
>      while the alien's ConfigProxy do all the job in a single synchronized
>      call,
>      but we can split it into a finer-grained steps:
>      1. apply the proposed change in-place
>      2. apply_changes() to get the interested observers updated.
>
>      so, to reuse the code across these two implementations, for instance,
>      set_mon_vals() will be implemented in ConfigProxy instead, so we can
>      have different behavior in different ConfigProxy classes. if we keep
>      using the existing single-piece md_config_t::set_mon_vals(), we have no
>      chance to differentiate the apply_changes() for seastar port. but the
>      alien implementation requires a grand lock protecting set_val() and
>      apply_changes(), so we have to move the lock from md_config_t up to
>      ConfigProxy. it's also simpler this way, as we don't need an extra
> layer
>      to have a dummy Mutex for seastar's ConfigProxy. as only the alien's
>      ConfigProxy requires the lock.
>
>      Signed-off-by: Kefu Chai <kchai@xxxxxxxxxx>
>
> David



-- 
Regards
Kefu Chai



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux