On Thu, Sep 9, 2010 at 3:27 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > Hi Nat, > > Can you add > > debug mds = 20 > debug ms = 1 > > to the [mds] section of your ceph.conf, restart the mds, and post the > resulting mds log (normally in /var/log/ceph)? > > It looks like interval_set is throwing the assertion because it's > inserting a value into the set that is already there.. the question is > why. Once we identify where the error came from, we can also make the > journal replay tolerate the condition as best it can. > > Thanks! > sage > Sage, I have completed this please see below for the last little bit before the crash 10.09.09_08:01:28.390543 44cef940 mds0.journal EMetaBlob.replay dir 10000385002 10.09.09_08:01:28.390550 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9175 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] 10.09.09_08:01:28.390574 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99 [2,head] auth (dversion lock) v=9174 inode=0x7fe8d06e5228 | inodepin dirty 0x7fe8c22dae30] 10.09.09_08:01:28.390590 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a8 [...2,head] /data/md_public/tmp_krx/99/ auth v9174 f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_13:47:44.000284 b36219707 7=5+2) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5228] 10.09.09_08:01:28.390609 44cef940 mds0.journal EMetaBlob.replay request client6312:2695 707 10.09.09_08:01:28.390626 44cef940 mds0.log _replay 16078321005~1330 / 16113176521 : EUpdate scatter_writebehind [metablob 10000385002, 2 dirs] 10.09.09_08:01:28.390634 44cef940 mds0.journal EMetaBlob.replay 2 dirlumps 10.09.09_08:01:28.390639 44cef940 mds0.journal EMetaBlob.replay dir 10000385002 10.09.09_08:01:28.390646 44cef940 mds0.cache.dir(10000385002) mark_dirty (already dirty) [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9177 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] version 9177 10.09.09_08:01:28.390668 44cef940 mds0.journal EMetaBlob.replay clean nestinfo on [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9177 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] 10.09.09_08:01:28.390687 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9177 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] 10.09.09_08:01:28.390709 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99 [2,head] auth (dversion lock) v=9176 inode=0x7fe8d06e5228 | inodepin dirty 0x7fe8c22dae30] 10.09.09_08:01:28.390721 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a8 [...2,head] /data/md_public/tmp_krx/99/ auth v9176 f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_13:47:44.000284 b36219707 7=5+2) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5228] 10.09.09_08:01:28.390741 44cef940 mds0.journal EMetaBlob.replay dir 100003850a8 10.09.09_08:01:28.390747 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 100003850a8 /data/md_public/tmp_krx/99/ [2,head] auth v=207 cv=0/0 state=1610612738|complete f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_13:47:44.000284 b36219707 6=5+1) hs=1+0,ss=0+0 dirty=1 | child dirty 0x7fe8ccb0b428] 10.09.09_08:01:28.390771 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99/output [2,head] auth (dversion lock) v=206 inode=0x7fe8d06e5a70 | inodepin dirty 0x7fe8c22db000] 10.09.09_08:01:28.390787 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a9 [...2,head] /data/md_public/tmp_krx/99/output/ auth v206 f(v2 m10.09.03_13:47:44.000284 5=5+0) n(v0 rc10.09.03_13:47:44.000284 b36219707 6=5+1) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5a70] 10.09.09_08:01:28.390826 44cef940 mds0.log _replay 16078322339~1343 / 16113176521 : EUpdate setattr [metablob 10000385002, 2 dirs] 10.09.09_08:01:28.390835 44cef940 mds0.journal EMetaBlob.replay 2 dirlumps 10.09.09_08:01:28.390841 44cef940 mds0.journal EMetaBlob.replay dir 10000385002 10.09.09_08:01:28.390847 44cef940 mds0.cache.dir(10000385002) mark_dirty (already dirty) [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9179 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] version 9179 10.09.09_08:01:28.390868 44cef940 mds0.journal EMetaBlob.replay clean nestinfo on [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9179 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] 10.09.09_08:01:28.390887 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 10000385002 /data/md_public/tmp_krx/ [2,head] auth v=9179 cv=0/0 state=1610612738|complete f(v0 m10.09.03_13:49:31.000144 190=1+189) n(v123 rc10.09.03_14:51:22.000677 b9509807549 1032=655+377) hs=190+1,ss=0+0 dirty=191 | child dirty 0x7fe8ccaced38] 10.09.09_08:01:28.390911 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99 [2,head] auth (dversion lock) v=9178 inode=0x7fe8d06e5228 | inodepin dirty 0x7fe8c22dae30] 10.09.09_08:01:28.390924 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a8 [...2,head] /data/md_public/tmp_krx/99/ auth v9178 f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_13:47:44.000284 b36219707 7=5+2) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5228] 10.09.09_08:01:28.390943 44cef940 mds0.journal EMetaBlob.replay dir 100003850a8 10.09.09_08:01:28.390950 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 100003850a8 /data/md_public/tmp_krx/99/ [2,head] auth v=209 cv=0/0 state=1610612738|complete f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_13:47:44.000284 b36219707 6=5+1) hs=1+0,ss=0+0 dirty=1 | child dirty 0x7fe8ccb0b428] 10.09.09_08:01:28.390974 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99/output [2,head] auth (dversion lock) v=208 inode=0x7fe8d06e5a70 | inodepin dirty 0x7fe8c22db000] 10.09.09_08:01:28.390988 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a9 [...2,head] /data/md_public/tmp_krx/99/output/ auth v208 f(v2 m10.09.03_13:47:44.000284 5=5+0) n(v0 rc10.09.03_13:47:44.000284 b36219707 6=5+1) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5a70] 10.09.09_08:01:28.391008 44cef940 mds0.journal EMetaBlob.replay request client6312:2697 707 10.09.09_08:01:28.391025 44cef940 mds0.log _replay 16078323686~1406 / 16113176521 : EUpdate openc [metablob 100003850a8, 2 dirs alloc_ino=10000385e80 prealloc_ino=[10000385e81~3e8] v3685237] 10.09.09_08:01:28.391034 44cef940 mds0.journal EMetaBlob.replay 2 dirlumps 10.09.09_08:01:28.391040 44cef940 mds0.journal EMetaBlob.replay dir 100003850a8 10.09.09_08:01:28.391046 44cef940 mds0.cache.dir(100003850a8) mark_dirty (already dirty) [dir 100003850a8 /data/md_public/tmp_krx/99/ [2,head] auth v=211 cv=0/0 state=1610612738|complete f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_14:56:31.000710 b36219707 7=6+1)/n(v10 rc10.09.03_13:47:44.000284 b36219707 6=5+1) hs=1+0,ss=0+0 dirty=1 | child dirty 0x7fe8ccb0b428] version 211 10.09.09_08:01:28.391068 44cef940 mds0.journal EMetaBlob.replay dirty nestinfo on [dir 100003850a8 /data/md_public/tmp_krx/99/ [2,head] auth v=211 cv=0/0 state=1610612738|complete f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_14:56:31.000710 b36219707 7=6+1)/n(v10 rc10.09.03_13:47:44.000284 b36219707 6=5+1) hs=1+0,ss=0+0 dirty=1 | child dirty 0x7fe8ccb0b428] 10.09.09_08:01:28.391099 44cef940 mds0.locker mark_updated_scatterlock (inest sync dirty) - already on list since 10.09.09_08:01:25.431917 10.09.09_08:01:28.391111 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 100003850a8 /data/md_public/tmp_krx/99/ [2,head] auth v=211 cv=0/0 state=1610612738|complete f(v2 m10.09.03_13:23:04.000779 1=0+1) n(v10 rc10.09.03_14:56:31.000710 b36219707 7=6+1)/n(v10 rc10.09.03_13:47:44.000284 b36219707 6=5+1) hs=1+0,ss=0+0 dirty=1 | child dirty 0x7fe8ccb0b428] 10.09.09_08:01:28.391136 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/data/md_public/tmp_krx/99/output [2,head] auth (dversion lock) v=210 inode=0x7fe8d06e5a70 | inodepin dirty 0x7fe8c22db000] 10.09.09_08:01:28.391150 44cef940 mds0.journal EMetaBlob.replay for [2,head] had [inode 100003850a9 [...2,head] /data/md_public/tmp_krx/99/output/ auth v210 f(v2 m10.09.03_14:56:31.000710 6=6+0) n(v0 rc10.09.03_14:56:31.000710 b36219707 7=6+1) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7fe8d06e5a70] 10.09.09_08:01:28.391174 44cef940 mds0.journal EMetaBlob.replay dir 100003850a9 10.09.09_08:01:28.391180 44cef940 mds0.journal EMetaBlob.replay updated dir [dir 100003850a9 /data/md_public/tmp_krx/99/output/ [2,head] auth v=185 cv=0/0 state=1610612738|complete f(v2 m10.09.03_14:56:31.000710 6=6+0) n(v0 rc10.09.03_14:56:31.000710 b36219707 6=6+0) hs=5+15,ss=0+0 dirty=20 | child dirty 0x7fe8ccb0ba20] 10.09.09_08:01:28.391212 44cef940 mds0.cache.dir(100003850a9) add_null_dentry [dentry #1/data/md_public/tmp_krx/99/output/.Kexpiring_2D_futures.zblock.bGxmqP [2,head] auth NULL (dversion lock) pv=0 v=185 inode=0 0x7fe8c245df80] 10.09.09_08:01:28.391228 44cef940 mds0.journal EMetaBlob.replay added [dentry #1/data/md_public/tmp_krx/99/output/.Kexpiring_2D_futures.zblock.bGxmqP [2,head] auth NULL (dversion lock) v=184 inode=0 | dirty 0x7fe8c245df80] 10.09.09_08:01:28.391250 44cef940 mds0.cache.dir(100003850a9) link_primary_inode [dentry #1/data/md_public/tmp_krx/99/output/.Kexpiring_2D_futures.zblock.bGxmqP [2,head] auth NULL (dversion lock) v=184 inode=0 | dirty 0x7fe8c245df80] [inode 10000385e80 [2,head] #10000385e80 auth v184 s=0 nl=1 n(v0 1=1+0) (iversion lock) cr={6312=0-4194304@0} 0x7fe8d09ecfc0] 10.09.09_08:01:28.391278 44cef940 mds0.journal EMetaBlob.replay added [inode 10000385e80 [2,head] /data/md_public/tmp_krx/99/output/.Kexpiring_2D_futures.zblock.bGxmqP auth v184 s=0 nl=1 n(v0 1=1+0) (iversion lock) cr={6312=0-4194304@0} | dirty 0x7fe8d09ecfc0] 10.09.09_08:01:28.391292 44cef940 mds0.journal EMetaBlob.replay noting opened inode [inode 10000385e80 [2,head] /data/md_public/tmp_krx/99/output/.Kexpiring_2D_futures.zblock.bGxmqP auth v184 s=0 nl=1 n(v0 1=1+0) (iversion lock) cr={6312=0-4194304@0} | dirty 0x7fe8d09ecfc0] 10.09.09_08:01:28.391308 44cef940 mds0.journal EMetaBlob.replay inotable v 3685237 - 1 == table 3685235 allocated+used 10000385e80 prealloc [10000385e81~3e8] 10.09.09_08:01:28.391317 44cef940 mds0.inotable: replay_alloc_id 10000385e80 10.09.09_08:01:28.391329 44cef940 mds0.inotable: replay_alloc_ids [10000385e81~3e8] 10.09.09_08:01:28.391338 44cef940 mds0.journal EMetaBlob.replay sessionmap v7370409 -(1|2) == table 7370408 prealloc [10000385e81~3e8] used 0 10.09.09_08:01:28.391345 44cef940 mds0.journal (session prealloc [10000385e80~3e8]) ./include/interval_set.h: In function 'void interval_set<T>::insert(T, T) [with T = inodeno_t]': ./include/interval_set.h:202: FAILED assert(0) 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x3f75) [0x691625] 2: (EUpdate::replay(MDS*)+0x38) [0x694d28] 3: (MDLog::_replay_thread()+0x68e) [0x68801e] 4: (MDLog::ReplayThread::entry()+0xd) [0x4bb3cd] 5: (Thread::_entry_func(void*)+0xa) [0x49c71a] 6: /lib64/libpthread.so.0 [0x3909c0673d] 7: (clone()+0x6d) [0x39094d3d1d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html