Re: ceph-mds crash v12.0.3

Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> · Fri, 16 Jun 2017 14:13:58 +0100

Hi Yan,

I've just checked my build process again...

Your patch did get applied to journal.cc in the tree cloned from git.

However, when I ran make-dist the resulting
ceph-12.0.3-1744-g84d57eb.tar.bz2 now contains an un-patched journal.cc
- presumably make-dist is downloading a new copy of journal.cc from github?

If I run ./do_cmake.sh ; cd build ; make
the new ceph-mds binary works perfectly,

So I've copied the good copy over /usr/bin/ceph-mds and, good news, my
MDS servers now work, so the the file system is accessible.

By the way, I recall Greg Farnum warning against snapshots in June 2016:
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010812.html>

Are snapshots still considered to be highly dangerous? And if so, is
there a likelyhood of this changing in the next year?

thanks again,

Jake

On 16/06/17 09:19, Jake Grimmett wrote:
> Hi Yan,
> 
> Many thanks for getting back to me - sorry to cause you bother.
> 
> I think I'm patching OK, but can you please check my methodology?
> 
> git clone git://github.com/ceph/ceph ; cd ceph
> 
> git apply ceph-mds.patch ; ./make-srpm.sh 
> 
> rpmbuild --rebuild /root/ceph/ceph/ceph-12.0.3-1661-g3ddbfcd.el7.src.rpm
> 
> 
> here is the section of the patched src/mds/journal.cc
> 
>    2194   // note which segments inodes belong to, so we don't have to
> start rejournaling them
>    2195   for (const auto &ino : inos) {
>    2196     CInode *in = mds->mdcache->get_inode(ino);
>    2197     if (!in) {
>    2198       dout(0) << "EOpen.replay ino " << ino << " not in
> metablob" << dendl;
>    2199       assert(in);
>    2200     }
>    2201     _segment->open_files.push_back(&in->item_open_file);
>    2202   }
>    2203   for (const auto &vino : snap_inos) {
>    2204     CInode *in = mds->mdcache->get_inode(vino);
>    2205     if (!in) {
>    2206       dout(0) << "EOpen.replay ino " << vino << " not in
> metablob" << dendl;
>    2207       continue;
>    2208     }
> 
> many thanks for your time,
> 
> Jake
> 
> 
> On 16/06/17 08:04, Yan, Zheng wrote:
>> On Thu, Jun 15, 2017 at 7:32 PM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>>> Hi Yan,
>>>
>>> Many thanks for looking into this and providing a patch.
>>>
>>> I've downloaded ceph 12.0.3-1661-g3ddbfcd, applied your patch, rebuilt
>>> the rpms, and installed across my cluster.
>>>
>>> Unfortunately, the MDS are still crashing, any ideas welcome :)
>>>
>>> With "debug_mds = 10" the full Log is 140MB, a truncated version of the
>>> log immediately preceding the crash follows:
>>>
>>> best,
>>>
>>> Jake
>>>
>>>     -5> 2017-06-15 12:21:14.084373 7f77fe590700 10 mds.0.journal
>>> EMetaBlob.replay added (full) [dentry
>>> #1/isilon/sc/users/spc/JessComb_AB_230115/JessB_TO_190115_F6_1/n0/JessB_TO_190115_F6_1.peaks_int
>>> [9f,head] auth NULL (dversion lock) v=3104 inode=0
>>> state=1073741888|bottomlru 0x7f781a3f1860]
>>>     -4> 2017-06-15 12:21:14.084375 7f77fe590700 10 mds.0.journal
>>> EMetaBlob.replay added [inode 1000147f773 [9f,head]
>>> /isilon/sc/users/spc/JessComb_AB_230115/JessB_TO_190115_F6_1/n0/JessB_TO_190115_F6_1.peaks_int
>>> auth v3104 s=4 n(v0 b4 1=1+0) (iversion lock) cr={3554272=0-4194304@9e}
>>> 0x7f781a3f5800]
>>>     -3> 2017-06-15 12:21:14.084379 7f77fe590700 10 mds.0.journal
>>> EMetaBlob.replay added (full) [dentry
>>> #1/isilon/sc/users/spc/JessComb_AB_230115/JessB_TO_190115_F6_1/n0/JessB_TO_190115_F6_1.peaks_maxt
>>> [9f,head] auth NULL (dversion lock) v=3132 inode=0
>>> state=1073741888|bottomlru 0x7f781a3f1d40]
>>>     -2> 2017-06-15 12:21:14.084381 7f77fe590700 10 mds.0.journal
>>> EMetaBlob.replay added [inode 1000147f775 [9f,head]
>>> /isilon/sc/users/spc/JessComb_AB_230115/JessB_TO_190115_F6_1/n0/JessB_TO_190115_F6_1.peaks_maxt
>>> auth v3132 s=4 n(v0 b4 1=1+0) (iversion lock) cr={3554272=0-4194304@9e}
>>> 0x7f781a3f5e00]
>>>     -1> 2017-06-15 12:21:14.084406 7f77fe590700  0 mds.0.journal
>>> EOpen.replay ino 1000147761b.9a not in metablob
>>>      0> 2017-06-15 12:21:14.085348 7f77fe590700 -1
>>> /root/rpmbuild/BUILD/ceph-12.0.3-1661-g3ddbfcd/src/mds/journal.cc: In
>>> function 'virtual void EOpen::replay(MDSRank*)' thread 7f77fe590700 time
>>> 2017-06-15 12:21:14.084409
>>> /root/rpmbuild/BUILD/ceph-12.0.3-1661-g3ddbfcd/src/mds/journal.cc: 2207:
>>> FAILED assert(in)
>>>
>> The assertion should be removed by my patch. Maybe you didn't cleanly
>> apply the patch.
>>
>>
>> Regards
>> Yan, Zheng
>>
>>>  ceph version 12.0.3-1661-g3ddbfcd
>>> (3ddbfcd4357ab3a3c2f17f86f88dc83172d4ce0d) luminous (dev)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x110) [0x7f780d290500]
>>>  2: (EOpen::replay(MDSRank*)+0x3e5) [0x7f780d2397b5]
>>>  3: (MDLog::_replay_thread()+0x5f2) [0x7f780d1efd12]
>>>  4: (MDLog::ReplayThread::entry()+0xd) [0x7f780cf9b6ad]
>>>  5: (()+0x7dc5) [0x7f780adb4dc5]
>>>  6: (clone()+0x6d) [0x7f7809e9476d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- logging levels ---
>>>    0/ 5 none
>>>    0/ 1 lockdep
>>>    0/ 1 context
>>>    1/ 1 crush
>>>   10/10 mds
>>>    1/ 5 mds_balancer
>>>    1/ 5 mds_locker
>>>    1/ 5 mds_log
>>>    1/ 5 mds_log_expire
>>>    1/ 5 mds_migrator
>>>    0/ 1 buffer
>>>    0/ 1 timer
>>>    0/ 1 filer
>>>    0/ 1 striper
>>>    0/ 1 objecter
>>>    0/ 5 rados
>>>    0/ 5 rbd
>>>    0/ 5 rbd_mirror
>>>    0/ 5 rbd_replay
>>>    0/ 5 journaler
>>>    0/ 5 objectcacher
>>>    0/ 5 client
>>>    1/ 5 osd
>>>    0/ 5 optracker
>>>    0/ 5 objclass
>>>    1/ 3 filestore
>>>    1/ 3 journal
>>>    0/ 5 ms
>>>    1/ 5 mon
>>>    0/10 monc
>>>    1/ 5 paxos
>>>    0/ 5 tp
>>>    1/ 5 auth
>>>    1/ 5 crypto
>>>    1/ 1 finisher
>>>    1/ 5 heartbeatmap
>>>    1/ 5 perfcounter
>>>    1/ 5 rgw
>>>    1/10 civetweb
>>>    1/ 5 javaclient
>>>    1/ 5 asok
>>>    1/ 1 throttle
>>>    0/ 0 refs
>>>    1/ 5 xio
>>>    1/ 5 compressor
>>>    1/ 5 bluestore
>>>    1/ 5 bluefs
>>>    1/ 3 bdev
>>>    1/ 5 kstore
>>>    4/ 5 rocksdb
>>>    4/ 5 leveldb
>>>    4/ 5 memdb
>>>    1/ 5 kinetic
>>>    1/ 5 fuse
>>>    1/ 5 mgr
>>>    1/ 5 mgrc
>>>    1/ 5 dpdk
>>>    1/ 5 eventtrace
>>>   -2/-2 (syslog threshold)
>>>   -1/-1 (stderr threshold)
>>>   max_recent     10000
>>>   max_new         1000
>>>   log_file /var/log/ceph/ceph-mds.cephfs1.log
>>> --- end dump of recent events ---
>>> 2017-06-15 12:21:14.101761 7f77fe590700 -1 *** Caught signal (Aborted) **
>>>  in thread 7f77fe590700 thread_name:md_log_replay
>>>
>>>  ceph version 12.0.3-1661-g3ddbfcd
>>> (3ddbfcd4357ab3a3c2f17f86f88dc83172d4ce0d) luminous (dev)
>>>  1: (()+0x57d7ff) [0x7f780d2507ff]
>>>  2: (()+0xf370) [0x7f780adbc370]
>>>  3: (gsignal()+0x37) [0x7f7809dd21d7]
>>>  4: (abort()+0x148) [0x7f7809dd38c8]
>>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x284) [0x7f780d290674]
>>>  6: (EOpen::replay(MDSRank*)+0x3e5) [0x7f780d2397b5]
>>>  7: (MDLog::_replay_thread()+0x5f2) [0x7f780d1efd12]
>>>  8: (MDLog::ReplayThread::entry()+0xd) [0x7f780cf9b6ad]
>>>  9: (()+0x7dc5) [0x7f780adb4dc5]
>>>  10: (clone()+0x6d) [0x7f7809e9476d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- begin dump of recent events ---
>>>      0> 2017-06-15 12:21:14.101761 7f77fe590700 -1 *** Caught signal
>>> (Aborted) **
>>>  in thread 7f77fe590700 thread_name:md_log_replay
>>>
>>>  ceph version 12.0.3-1661-g3ddbfcd
>>> (3ddbfcd4357ab3a3c2f17f86f88dc83172d4ce0d) luminous (dev)
>>>  1: (()+0x57d7ff) [0x7f780d2507ff]
>>>  2: (()+0xf370) [0x7f780adbc370]
>>>  3: (gsignal()+0x37) [0x7f7809dd21d7]
>>>  4: (abort()+0x148) [0x7f7809dd38c8]
>>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x284) [0x7f780d290674]
>>>  6: (EOpen::replay(MDSRank*)+0x3e5) [0x7f780d2397b5]
>>>  7: (MDLog::_replay_thread()+0x5f2) [0x7f780d1efd12]
>>>  8: (MDLog::ReplayThread::entry()+0xd) [0x7f780cf9b6ad]
>>>  9: (()+0x7dc5) [0x7f780adb4dc5]
>>>  10: (clone()+0x6d) [0x7f7809e9476d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- logging levels ---
>>>    0/ 5 none
>>>    0/ 1 lockdep
>>>    0/ 1 context
>>>    1/ 1 crush
>>>   10/10 mds
>>>    1/ 5 mds_balancer
>>>    1/ 5 mds_locker
>>>    1/ 5 mds_log
>>>    1/ 5 mds_log_expire
>>>    1/ 5 mds_migrator
>>>    0/ 1 buffer
>>>    0/ 1 timer
>>>    0/ 1 filer
>>>    0/ 1 striper
>>>    0/ 1 objecter
>>>    0/ 5 rados
>>>    0/ 5 rbd
>>>    0/ 5 rbd_mirror
>>>    0/ 5 rbd_replay
>>>    0/ 5 journaler
>>>    0/ 5 objectcacher
>>>    0/ 5 client
>>>    1/ 5 osd
>>>    0/ 5 optracker
>>>    0/ 5 objclass
>>>    1/ 3 filestore
>>>    1/ 3 journal
>>>    0/ 5 ms
>>>    1/ 5 mon
>>>    0/10 monc
>>>    1/ 5 paxos
>>>    0/ 5 tp
>>>    1/ 5 auth
>>>    1/ 5 crypto
>>>    1/ 1 finisher
>>>    1/ 5 heartbeatmap
>>>    1/ 5 perfcounter
>>>    1/ 5 rgw
>>>    1/10 civetweb
>>>    1/ 5 javaclient
>>>    1/ 5 asok
>>>    1/ 1 throttle
>>>    0/ 0 refs
>>>    1/ 5 xio
>>>    1/ 5 compressor
>>>    1/ 5 bluestore
>>>    1/ 5 bluefs
>>>    1/ 3 bdev
>>>    1/ 5 kstore
>>>    4/ 5 rocksdb
>>>    4/ 5 leveldb
>>>    4/ 5 memdb
>>>    1/ 5 kinetic
>>>    1/ 5 fuse
>>>    1/ 5 mgr
>>>    1/ 5 mgrc
>>>    1/ 5 dpdk
>>>    1/ 5 eventtrace
>>>   -2/-2 (syslog threshold)
>>>   -1/-1 (stderr threshold)
>>>   max_recent     10000
>>>   max_new         1000
>>>   log_file /var/log/ceph/ceph-mds.cephfs1.log
>>> --- end dump of recent events ---
>>>
>>>
>>> On 15/06/17 08:10, Yan, Zheng wrote:
>>>> On Wed, Jun 14, 2017 at 11:49 PM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>>>>> Dear All,
>>>>>
>>>>> Sorry, but I need to add +1 to the mds crash reports with ceph
>>>>> 12.0.3-1507-g52f0deb
>>>>>
>>>>> This happened to me after updating from 12.0.2
>>>>> All was fairly OK for a few hours, I/O  around 500MB/s, then both MDS
>>>>> servers crashed, and have not worked since.
>>>>>
>>>>> The two MDS servers, are active:standby, both now crash immediately
>>>>> after being started.
>>>>>
>>>>> This cluster has been upgraded from Kraken, through several Luminous
>>>>> versions, so I did a clean install of SL7.3 on one MDS server, and still
>>>>> have crashes on this machine.
>>>>>
>>>>> Cluster has 40 x 8TB drives (EC 4+1), with dual replicated NVME
>>>>> providing a hotpool to drive the Cephfs layer. df -h /cephfs is/was
>>>>> 200TB. All OSD's are bluestore, and were generated on Luminous.
>>>>>
>>>>> I enabled snapshots a few days ago, and keep 144 snapshots (one taken
>>>>> every 10 minutes, each is kept for 24 hours only) about 30TB is copied
>>>>> into the fs each day. If snapshots caused the crash, I can regenerate
>>>>> the data, but they are very useful.
>>>>>
>>>>> One MDS gave this log...
>>>>>
>>>>> <http://www.mrc-lmb.cam.ac.uk/jog/ceph-mds.cephfs1.log>
>>>> It is a snapshot related bug. The Attached patch should prevent mds
>>>> from crashing.
>>>> Next time you restart mds, please set debug_mds=10 and upload the log.
>>>>
>>>> Regards
>>>> Yan, Zheng
>>>>
>>>>> many thanks for any suggestions, and it's great to see the experimental
>>>>> flag removed from bluestore!
>>>>>
>>>>> Jake
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html