Team, Right now we have an obsolete script in src/script/sepia_bt.sh which doesn't work out of the box anymore to debug core dumps. It used to work by unpackaging the rpms and passing certain options to gdb to get it to load the proper executable/debug info. My understanding is the script is abandoned and most devs are debugging through manual processes. I went ahead and revived the effort to have a simple and automated way to debug core dumps. It uses docker for throwaway containers so we can install the ceph packages normally and have full access to the surrounding OS environment. This is available as a simple script to setup a docker container with all of the Ceph + debug packages installed. It is in this PR: https://github.com/ceph/ceph/pull/16375 Using it on a sepia lab machine (e.g. a senta box) with docker allows you to trivially gdb a core dump: (N.B. sudo is required to use docker on senta) pdonnell@senta02:~/ceph$ sudo src/script/ceph-debug-docker.sh wip-pdonnell-20170713 branch: wip-pdonnell-20170713 specify the build environment [default "centos:7"]: env: centos:7 /tmp/tmp.ufb7w7AeLi ~/ceph docker build --tag pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7 - Sending build context to Docker daemon 2.048 kB Step 1 : FROM centos:7 ---> 36540f359ca3 [...] ---> 405e85b3d4e1 Removing intermediate container 3325d7895092 Successfully built 405e85b3d4e1 real 10m11.611s user 0m0.040s sys 0m0.052s ~/ceph built image pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7 docker run -ti -v /ceph:/ceph:ro pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7 [root@ca90ceb29dd6 ~]# gdb -q /ceph/teuthology-archive/pdonnell-2017-07-14_04:28:28-fs-wip-pdonnell-20170713-distro-basic-smithi/1399080/remote/smithi131/coredump/1500013578.8678.core [...] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `ceph-osd -f --cluster ceph -i 3'. Program terminated with signal 6, Aborted. #0 0x00007f4dffdd5741 in pthread_kill () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 fuse-libs-2.9.2-7.el7.x86_64 glibc-2.17-157.el7_3.4.x86_64 gperftools-libs-2.4-8.el7.x86_64 leveldb-1.12.0-11.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libblkid-2.23.2-33.el7_3.2.x86_64 libgcc-4.8.5-11.el7.x86_64 libibverbs-1.2.1-1.el7.x86_64 libnl3-3.2.28-3.el7_3.x86_64 libstdc++-4.8.5-11.el7.x86_64 libunwind-1.1-5.el7_2.2.x86_64 libuuid-2.23.2-33.el7_3.2.x86_64 lttng-ust-2.4.1-4.el7.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.4-1.2.el7_3.x86_64 nss-softokn-3.16.2.3-14.4.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 nss-util-3.28.4-1.0.el7_3.x86_64 snappy-1.1.0-3.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) (gdb) bt #0 0x00007f4dffdd5741 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00007f4e032ec008 in ceph::HeartbeatMap::_check (this=this@entry=0x7f4e0d561400, h=h@entry=0x7f4e0dc6fad0, who=who@entry=0x7f4e036c77b6 "reset_timeout", now=now@entry=1500013578) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/HeartbeatMap.cc:82 #2 0x00007f4e032ec465 in ceph::HeartbeatMap::reset_timeout (this=0x7f4e0d561400, h=h@entry=0x7f4e0dc6fad0, grace=60, suicide_grace=suicide_grace@entry=0) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/HeartbeatMap.cc:94 #3 0x00007f4e02c9c0ef in OSD::ShardedOpWQ::_process (this=0x7f4e0d885240, thread_index=<optimized out>, hb=0x7f4e0dc6fad0) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/osd/OSD.cc:9896 #4 0x00007f4e031e2179 in ShardedThreadPool::shardedthreadpool_worker (this=0x7f4e0d884a30, thread_index=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/WorkQueue.cc:343 #5 0x00007f4e031e4300 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/WorkQueue.h:689 #6 0x00007f4dffdd0dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f4dfeec476d in clone () from /lib64/libc.so.6 The whole process to build the image takes approximately 10 minutes. You can use the same container to debug multiple core dumps from the same build and OS environment (e.g. centos7). Keep in mind that you can setup these containers while teuthology tests are running if you're anticipating core dumps to debug. -- Patrick Donnelly -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html