Debugging core dumps from teuthology testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Team,

Right now we have an obsolete script in src/script/sepia_bt.sh which
doesn't work out of the box anymore to debug core dumps. It used to
work by unpackaging the rpms and passing certain options to gdb to get
it to load the proper executable/debug info. My understanding is the
script is abandoned and most devs are debugging through manual
processes.

I went ahead and revived the effort to have a simple and automated way
to debug core dumps. It uses docker for throwaway containers so we can
install the ceph packages normally and have full access to the
surrounding OS environment. This is available as a simple script to
setup a docker container with all of the Ceph + debug packages
installed. It is in this PR:

https://github.com/ceph/ceph/pull/16375

Using it on a sepia lab machine (e.g. a senta box) with docker allows
you to trivially gdb a core dump:

(N.B. sudo is required to use docker on senta)
pdonnell@senta02:~/ceph$ sudo src/script/ceph-debug-docker.sh
wip-pdonnell-20170713
branch: wip-pdonnell-20170713
specify the build environment [default "centos:7"]:
env: centos:7
/tmp/tmp.ufb7w7AeLi ~/ceph
docker build --tag pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7 -
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM centos:7
 ---> 36540f359ca3
[...]
 ---> 405e85b3d4e1
Removing intermediate container 3325d7895092
Successfully built 405e85b3d4e1

real    10m11.611s
user    0m0.040s
sys     0m0.052s
~/ceph
built image pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7
docker run -ti -v /ceph:/ceph:ro pdonnell/ceph-ci:wip-pdonnell-20170713-centos-7

[root@ca90ceb29dd6 ~]# gdb -q
/ceph/teuthology-archive/pdonnell-2017-07-14_04:28:28-fs-wip-pdonnell-20170713-distro-basic-smithi/1399080/remote/smithi131/coredump/1500013578.8678.core
[...]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ceph-osd -f --cluster ceph -i 3'.
Program terminated with signal 6, Aborted.
#0  0x00007f4dffdd5741 in pthread_kill () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 fuse-libs-2.9.2-7.el7.x86_64
glibc-2.17-157.el7_3.4.x86_64 gperftools-libs-2.4-8.el7.x86_64
leveldb-1.12.0-11.el7.x86_64 libaio-0.3.109-13.el7.x86_64
libblkid-2.23.2-33.el7_3.2.x86_64 libgcc-4.8.5-11.el7.x86_64
libibverbs-1.2.1-1.el7.x86_64 libnl3-3.2.28-3.el7_3.x86_64
libstdc++-4.8.5-11.el7.x86_64 libunwind-1.1-5.el7_2.2.x86_64
libuuid-2.23.2-33.el7_3.2.x86_64 lttng-ust-2.4.1-4.el7.x86_64
nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.4-1.2.el7_3.x86_64
nss-softokn-3.16.2.3-14.4.el7.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
nss-util-3.28.4-1.0.el7_3.x86_64 snappy-1.1.0-3.el7.x86_64
sqlite-3.7.17-8.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb)
(gdb) bt
#0  0x00007f4dffdd5741 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00007f4e032ec008 in ceph::HeartbeatMap::_check
(this=this@entry=0x7f4e0d561400, h=h@entry=0x7f4e0dc6fad0,
who=who@entry=0x7f4e036c77b6 "reset_timeout",
now=now@entry=1500013578)
    at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/HeartbeatMap.cc:82
#2  0x00007f4e032ec465 in ceph::HeartbeatMap::reset_timeout
(this=0x7f4e0d561400, h=h@entry=0x7f4e0dc6fad0, grace=60,
suicide_grace=suicide_grace@entry=0)
    at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/HeartbeatMap.cc:94
#3  0x00007f4e02c9c0ef in OSD::ShardedOpWQ::_process
(this=0x7f4e0d885240, thread_index=<optimized out>, hb=0x7f4e0dc6fad0)
at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/osd/OSD.cc:9896
#4  0x00007f4e031e2179 in ShardedThreadPool::shardedthreadpool_worker
(this=0x7f4e0d884a30, thread_index=<optimized out>) at
/usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/WorkQueue.cc:343
#5  0x00007f4e031e4300 in ShardedThreadPool::WorkThreadSharded::entry
(this=<optimized out>) at
/usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/WorkQueue.h:689
#6  0x00007f4dffdd0dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f4dfeec476d in clone () from /lib64/libc.so.6


The whole process to build the image takes approximately 10 minutes.
You can use the same container to debug multiple core dumps from the
same build and OS environment (e.g. centos7). Keep in mind that you
can setup these containers while teuthology tests are running if
you're anticipating core dumps to debug.

-- 
Patrick Donnelly
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux