Re: gdb in docker for Centos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Of course f21 is bogus anyway...

(gdb) f
#22 0x0000562bfb812866 in OSDService::try_get_map
(this=0x562bfdca6360, this@entry=0x562bfdb433e0, epoch=<optimized
out>, epoch@entry=58) at
/usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:1480
1480        map->decode(bl);

That means we should be calling 'OSDMap::decode(ceph::buffer::list&)'
but somehow we've ended up in 'OSDService::_get_map_bl'

We did call that function recently.

(gdb) l
1475        if (!_get_map_bl(epoch, bl) || bl.length() == 0) {
1476          derr << "failed to load OSD map for epoch " << epoch <<
", got " << bl.length() << " bytes" << dendl;
1477          delete map;
1478          return OSDMapRef();
1479        }
1480        map->decode(bl);
1481      } else {
1482        dout(20) << "get_map " << epoch << " - return initial " <<
map << dendl;
1483      }
1484      return _add_map(map);

So possibly gdb has found that artefact on the stack and decided it is
the next frame?


On Mon, Jun 25, 2018 at 9:00 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> The first stack looks to be so badly smashed that gdb can't unwind it.
>
> The second stack also appears to be badly corrupted.
>
> (gdb) f
> #22 0x0000562bfb812866 in OSDService::try_get_map
> (this=0x562bfdca6360, this@entry=0x562bfdb433e0, epoch=<optimized
> out>, epoch@entry=58) at
> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:1480
> 1480        map->decode(bl);
> (gdb) p &bl
> $13 = (ceph::bufferlist *) 0x7fe4bf8dccf0
> (gdb) p $rsp
> $15 = (void *) 0x7fe4bf8dccc0
>
> So bl is a bufferlist variable on the stack we are passing to decode.
>
> (gdb) down
> #21 0x0000562bfb8053ed in OSDService::_get_map_bl (this=0x3a,
> e=1356752640, bl=...) at
> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:1364
> 1364        _add_map_bl(e, bl);
> (gdb) p bl
> $16 = (ceph::bufferlist &) @0xcb4eb4a50de6700: <error reading variable>
> (gdb) p $rsp
> $18 = (void *) 0x7fe4bf8dcc20
> (gdb) x/5x 0x7fe4bf8dcc20
> 0x7fe4bf8dcc20: 0x0000000000001002      0x0cb4eb4a50de6700
> 0x7fe4bf8dcc30: 0x0000562bfdc6fc38      0x00007fe4bf8dcc70
> 0x7fe4bf8dcc40: 0x0000562bfdc7b200
>
> So it looks like the stack has been overwritten with some data and the
> stack can not be trusted.
>
>
> On Sat, Jun 23, 2018 at 6:35 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> On Fri, Jun 22, 2018 at 8:51 AM David Zafman <dzafman@xxxxxxxxxx> wrote:
>>>
>>>
>>> These are the Centos run stack traces generated with docker using the
>>> same runs as I had been doing for
>>> https://tracker.ceph.com/issues/23492.  The crashes under Ubuntu always
>>> involved either a decode() crash, decode() exception or assert(st_size
>>> != 0).
>>>
>>> Is this a clue, or is it just a Centos anomaly?  BTW, all other threads
>>> in these cores look fine!
>>
>> If the other threads look fine, that makes it sound a lot like this is
>> a clue and there's some memory corruption happening to the stack.
>> Which makes sense since you're seeing OSDMaps getting overwritten with
>> log data here, right?
>> -Greg
>>
>>>
>>> 2 failures in
>>> http://pulpito.ceph.com/dzafman-2018-06-21_15:29:45-rados:standalone-wip-zafman-testing2-distro-basic-smithi/
>>>
>>>
>>> # sudo ./ceph-debug-docker.sh
>>> wip-zafman-testing2:d12ea8b6b641958cdfcf609d2fad8947a21965cf centos:7
>>>
>>> # gdb /usr/bin/ceph-osd
>>> /ceph/teuthology-archive/dzafman-2018-06-21_15:29:45-rados:standalone-wip-zafman-testing2-distro-basic-smithi/2687446/remote/smithi168/coredump/1529622699.99940.core
>>> ...
>>> warning: .dynamic section for "/lib64/libudev.so.1" is not at the
>>> expected address (wrong library or version mismatch?)
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `ceph-osd -i 3
>>> --fsid=f2babf17-f782-48ce-9563-54ab3eb8dc70 --auth-supported=none'.
>>> Program terminated with signal 6, Aborted.
>>> #0  0x00007f9d992b659b in raise () from /lib64/libpthread.so.0
>>> Missing separate debuginfos, use: debuginfo-install
>>> bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64
>>> elfutils-libs-0.170-4.el7.x86_64 fuse-libs-2.9.2-10.el7.x86_64
>>> glibc-2.17-222.el7.x86_64 gperftools-libs-2.6.1-1.el7.x86_64
>>> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64
>>> leveldb-1.12.0-11.el7.x86_64 libaio-0.3.109-13.el7.x86_64
>>> libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-52.el7.x86_64
>>> libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>>> libgcc-4.8.5-28.el7_5.1.x86_64 libibverbs-15-7.el7_5.x86_64
>>> libnl3-3.2.28-4.el7.x86_64 liboath-2.4.1-9.el7.x86_64
>>> libselinux-2.5-12.el7.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64
>>> libuuid-2.23.2-52.el7.x86_64 lttng-ust-2.4.1-4.el7.x86_64
>>> lz4-1.7.5-2.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64
>>> nss-3.36.0-5.el7_5.x86_64 nss-softokn-3.36.0-5.el7_5.x86_64
>>> nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64
>>> openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64
>>> snappy-1.1.0-3.el7.x86_64 sqlite-3.7.17-8.el7.x86_64
>>> systemd-libs-219-57.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64
>>> xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
>>> (gdb) thread
>>> [Current thread is 1 (Thread 0x7f9d7671f700 (LWP 100040))]
>>> (gdb) bt
>>> #0  0x00007f9d992b659b in raise () from /lib64/libpthread.so.0
>>> #1  0x0000558705d5f521 in reraise_fatal (signum=6) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/global/signal_handler.cc:74
>>> #2  handle_fatal_signal (signum=6) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/global/signal_handler.cc:138
>>> #3  <signal handler called>
>>> #4  0x00007f9d982d6277 in raise () from /lib64/libc.so.6
>>> #5  0x00007f9d982d7968 in abort () from /lib64/libc.so.6
>>> #6  0x00007f9d98be5ac5 in __cxa_vec_dtor () from /lib64/libstdc++.so.6
>>> #7  0x00007f9d98be3a63 in ?? () from /lib64/libstdc++.so.6
>>> #8  0x00007f9d76719a20 in ?? ()
>>> #9  0x0000000000000ae0 in ?? ()
>>> #10 0x00007f9d9b1fe5c6 in (anonymous namespace)::do_memalign(unsigned
>>> long, unsigned long) () from /lib64/libtcmalloc.so.4
>>> #11 0x00007f9d9b21d010 in tc_posix_memalign () from /lib64/libtcmalloc.so.4
>>> #12 0x00007f9d9b20aacc in tcmalloc::PageHeap::Carve(tcmalloc::Span*,
>>> unsigned long) () from /lib64/libtcmalloc.so.4
>>> #13 0x00007f9d9b20b591 in tcmalloc::PageHeap::New(unsigned long) () from
>>> /lib64/libtcmalloc.so.4
>>> #14 0x00007f9d9b20a230 in tcmalloc::CentralFreeList::Populate() () from
>>> /lib64/libtcmalloc.so.4
>>> #15 0x00007f9d767199d0 in ?? ()
>>> #16 0x0000000000000000 in ?? ()
>>>
>>>   # gdb /usr/bin/ceph-osd
>>> /ceph/teuthology-archive/dzafman-2018-06-21_15:29:45-rados:standalone-wip-zafman-testing2-distro-basic-smithi/2687443/remote/smithi111/coredump/1529622850.110914.core
>>>
>>> Thread 1 (Thread 0x7fe4bf8e2700 (LWP 111012)):
>>> #0  0x00007fe4e247959b in raise () from /lib64/libpthread.so.0
>>> #1  0x0000562bfbd41521 in reraise_fatal (signum=6) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/global/signal_handler.cc:74
>>> #2  handle_fatal_signal (signum=6) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/global/signal_handler.cc:138
>>> #3  <signal handler called>
>>> #4  0x00007fe4e1499277 in raise () from /lib64/libc.so.6
>>> #5  0x00007fe4e149a968 in abort () from /lib64/libc.so.6
>>> #6  0x00007fe4e1da8ac5 in __cxa_vec_dtor () from /lib64/libstdc++.so.6
>>> #7  0x00007fe4e1da6a63 in ?? () from /lib64/libstdc++.so.6
>>> #8  0x00007fe4bf8dca20 in ?? ()
>>> #9  0x0000000000000ae0 in ?? ()
>>> #10 0x00007fe4e43c15c6 in (anonymous namespace)::do_memalign(unsigned
>>> long, unsigned long) () from /lib64/libtcmalloc.so.4
>>> #11 0x00007fe4e43e0010 in tc_posix_memalign () from /lib64/libtcmalloc.so.4
>>> #12 0x00007fe4e58cbc23 in raw (mempool=10, l=2710, c=<optimized out>,
>>> this=0x8) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/include/buffer_raw.h:44
>>> #13 raw_combined (mempool=10, align=2710, l=2710, dataptr=<optimized
>>> out>, this=0x8) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/common/buffer.cc:181
>>> #14 create (mempool=10, align=2710, len=2710) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/common/buffer.cc:214
>>> #15 ceph::buffer::create_aligned_in_mempool (len=2710, align=2710,
>>> mempool=10) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/common/buffer.cc:709
>>> #16 0x0cb4eb4a50de6700 in ?? ()
>>> #17 0x0000562bfc425940 in ?? ()
>>> #18 0x0000562bfbb5a030 in FileStore::read (this=<optimized out>, ch=...,
>>> oid=..., offset=<optimized out>, len=<optimized out>, bl=...,
>>> op_flags=4222983760) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/os/filestore/FileStore.cc:3382
>>> #19 0x0000000000000a96 in ?? ()
>>> #20 0x0000562bfbb59e50 in ?? () at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/os/filestore/FileStore.cc:1909
>>> #21 0x0000562bfb8053ed in OSDService::_get_map_bl (this=0x3a,
>>> e=1356752640, bl=...) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:1364
>>> #22 0x0000562bfb812866 in OSDService::try_get_map (this=0x562bfdca6360,
>>> this@entry=0x562bfdb433e0, epoch=<optimized out>, epoch@entry=58) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:1480
>>> #23 0x0000562bfb81b2dd in OSD::advance_pg
>>> (this=this@entry=0x562bfdb42000, osd_epoch=<optimized out>,
>>> pg=pg@entry=0x562bfdc34000, handle=..., rctx=rctx@entry=0x7fe4bf8dcf90)
>>> at /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:7747
>>> #24 0x0000562bfb81ba81 in OSD::dequeue_peering_evt (this=0x562bfdb42000,
>>> sdata=<optimized out>, pg=0x562bfdc34000, evt=std::shared_ptr (count 2,
>>> weak 0) 0x562bfdc40f90, handle=...) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:8877
>>> #25 0x0000562bfba741a0 in PGPeeringItem::run (this=<optimized out>,
>>> osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OpQueueItem.cc:34
>>> #26 0x0000562bfb824532 in run (handle=..., pg=..., sdata=<optimized
>>> out>, osd=<optimized out>, this=0x7fe4bf8dd140) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OpQueueItem.h:134
>>> #27 OSD::ShardedOpWQ::_process (this=0x562bfdb43048,
>>> thread_index=<optimized out>, hb=<optimized out>) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/osd/OSD.cc:9849
>>> #28 0x00007fe4e5923f63 in ShardedThreadPool::shardedthreadpool_worker
>>> (this=0x562bfdb42930, thread_index=<optimized out>) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/common/WorkQueue.cc:339
>>> #29 0x00007fe4e5924b50 in ShardedThreadPool::WorkThreadSharded::entry
>>> (this=<optimized out>) at
>>> /usr/src/debug/ceph-14.0.0-696-gd12ea8b/src/common/WorkQueue.h:690
>>> #30 0x00007fe4e2471e25 in start_thread () from /lib64/libpthread.so.0
>>> #31 0x00007fe4e1561bad in clone () from /lib64/libc.so.6
>>>
>>> David
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> Brad



-- 
Cheers,
Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux