Re: multi-mds crash on first file create (git/unstable)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 30 Sep 2010, Thomas Mueller wrote:
> hi
> 
> now on git/unstable - no more git/master.
> 
> started with vstart.sh and
> 
> export CEPH_NUM_MON=1
> export CEPH_NUM_OSD=1
> export CEPH_NUM_MDS=3
> 
> 
> right after first file creation (mktemp 
> /<pathto>/testspace/ceph_basiccheck_testspace.XXXX) the mds0
> crashed and the kclient hangs now.
> 
> do i need the 2.6.36rc kclient to work with multi-mds testing?

No need.  It was a bad assert I introduced a few days ago.  Reverted for 
now, and will audit this code later.  The fix is pushed to unstable.

Thanks!
sage



> 
> rev: 7657a6d5b30dd181350acf19681847d9c8f5d694
> 
> - Thomas
> 
> 2010-09-30 19:37:44.538704 7f7802dea710 mds0.locker eval done
> 2010-09-30 19:37:44.538715 7f7802dea710 mds0.server dispatch_client_request client_request(client4106:27 create #1/ceph_basiccheck_testspace.VArD)
> 2010-09-30 19:37:44.538733 7f7802dea710 mds0.server open w/ O_CREAT on #1/ceph_basiccheck_testspace.VArD
> 2010-09-30 19:37:44.538746 7f7802dea710 mds0.server rdlock_path_xlock_dentry request(client4106:27 cr=0x29e7b40) #1/ceph_basiccheck_testspace.VArD
> 2010-09-30 19:37:44.538758 7f7802dea710 mds0.server traverse_to_auth_dir dirpath #1 dname ceph_basiccheck_testspace.VArD
> 2010-09-30 19:37:44.538769 7f7802dea710 mds0.cache traverse: opening base ino 1 snap head
> 2010-09-30 19:37:44.538781 7f7802dea710 mds0.cache path_traverse finish on snapid head
> 2010-09-30 19:37:44.538792 7f7802dea710 mds0.server traverse_to_auth_dir [dir 1 / [2,head] auth{1=2,2=1} v=1 cv=1/1 REP dir_auth=0 state=1073741826|complete f(v0 1=0+1) n(v0 1=0+1) hs=1+6,ss=0+0 | child subtree replicated 0x29fd000]
> 2010-09-30 19:37:44.538808 7f7802dea710 mds0.server rdlock_path_xlock_dentry dir [dir 1 / [2,head] auth{1=2,2=1} v=1 cv=1/1 REP dir_auth=0 state=1073741826|complete f(v0 1=0+1) n(v0 1=0+1) hs=1+6,ss=0+0 | child subtree replicated 0x29fd000]
> 2010-09-30 19:37:44.538823 7f7802dea710 mds0.server prepare_null_dentry ceph_basiccheck_testspace.VArD in [dir 1 / [2,head] auth{1=2,2=1} v=1 cv=1/1 REP dir_auth=0 state=1073741826|complete f(v0 1=0+1) n(v0 1=0+1) hs=1+6,ss=0+0 | child subtree replicated 0x29fd000]
> 2010-09-30 19:37:44.538839 7f7802dea710 mds0.cache.dir(1) lookup (head, 'ceph_basiccheck_testspace.VArD')
> 2010-09-30 19:37:44.538849 7f7802dea710 mds0.cache.dir(1)   hit -> (ceph_basiccheck_testspace.VArD,head)
> 2010-09-30 19:37:44.538862 7f7802dea710 mds0.locker acquire_locks request(client4106:27 cr=0x29e7b40)
> 2010-09-30 19:37:44.538873 7f7802dea710 mds0.locker  must xlock (dn sync) [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.538890 7f7802dea710 mds0.locker  must wrlock (ifile sync) [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.538907 7f7802dea710 mds0.locker  must wrlock (inest sync) [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.538926 7f7802dea710 mds0.locker  must wrlock (dversion lock) [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.538940 7f7802dea710 mds0.locker  must rdlock (iauth sync) [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.538958 7f7802dea710 mds0.locker  must rdlock (isnap sync) [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.538975 7f7802dea710 mds0.locker  must rdlock (dn sync) [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.538989 7f7802dea710 mds0.locker  must authpin [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.539006 7f7802dea710 mds0.locker  must authpin [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.539023 7f7802dea710 mds0.locker  must authpin [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.539039 7f7802dea710 mds0.locker  must authpin [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.539059 7f7802dea710 mds0.locker  must authpin [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.539073 7f7802dea710 mds0.locker  must authpin [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.539085 7f7802dea710 mds0.locker  auth_pinning [inode 1 [...2,head] / auth{1=2,2=1} v1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated 0x29e9000]
> 2010-09-30 19:37:44.539103 7f7802dea710 mds0.cache.ino(1) auth_pin by 0x2a1a000 on [inode 1 [...2,head] / auth{1=2,2=1} v1 ap=1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated authpin 0x29e9000] now 1+0
> 2010-09-30 19:37:44.539121 7f7802dea710 mds0.locker  already auth_pinned [inode 1 [...2,head] / auth{1=2,2=1} v1 ap=1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated authpin 0x29e9000]
> 2010-09-30 19:37:44.539138 7f7802dea710 mds0.locker  already auth_pinned [inode 1 [...2,head] / auth{1=2,2=1} v1 ap=1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated authpin 0x29e9000]
> 2010-09-30 19:37:44.539155 7f7802dea710 mds0.locker  already auth_pinned [inode 1 [...2,head] / auth{1=2,2=1} v1 ap=1 snaprealm=0x29e8480 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) caps={4106=pAsLsXs/p@26} | dirfrag caps replicated authpin 0x29e9000]
> 2010-09-30 19:37:44.539172 7f7802dea710 mds0.locker  auth_pinning [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 inode=0 0x2a0ae80]
> 2010-09-30 19:37:44.539185 7f7802dea710 mds0.cache.den(1 ceph_basiccheck_testspace.VArD) auth_pin by 0x2a1a000 on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 ap=1+0 inode=0 | authpin 0x2a0ae80] now 1+0
> 2010-09-30 19:37:44.539199 7f7802dea710 mds0.cache.dir(1) adjust_nested_auth_pins 1/1 on [dir 1 / [2,head] auth{1=2,2=1} v=1 cv=1/1 REP dir_auth=0 ap=0+1+1 state=1073741826|complete f(v0 1=0+1) n(v0 1=0+1) hs=1+6,ss=0+0 | child subtree replicated 0x29fd000] count now 0 + 1
> 2010-09-30 19:37:44.539216 7f7802dea710 mds0.locker  already auth_pinned [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 ap=1+0 inode=0 | authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539231 7f7802dea710 mds0.locker local_wrlock_start  on (dversion lock) on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock) pv=0 v=1 ap=1+0 inode=0 | authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539246 7f7802dea710 mds0.locker  got wrlock on (dversion lock w=1 last_client=4106) [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock w=1 last_client=4106) pv=0 v=1 ap=1+0 inode=0 | lock authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539262 7f7802dea710 mds0.locker xlock_start on (dn sync) on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock w=1 last_client=4106) pv=0 v=1 ap=1+0 inode=0 | lock authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539276 7f7802dea710 mds0.locker simple_lock on (dn sync) on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dversion lock w=1 last_client=4106) pv=0 v=1 ap=1+0 inode=0 | lock authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539293 7f7802dea710 mds0.locker simple_xlock on (dn lock) on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dn lock) (dversion lock w=1 last_client=4106) pv=0 v=1 ap=1+0 inode=0 | lock authpin 0x2a0ae80]
> 2010-09-30 19:37:44.539308 7f7802dea710 mds0.cache.den(1 ceph_basiccheck_testspace.VArD) auth_pin by 0x2a0afc8 on [dentry #1/ceph_basiccheck_testspace.VArD [2,head] auth NULL (dn lock) (dversion lock w=1 last_client=4106) pv=0 v=1 ap=2+0 inode=0 | lock authpin 0x2a0ae80] now 2+0
> 2010-09-30 19:37:44.539323 7f7802dea710 mds0.cache.dir(1) adjust_nested_auth_pins 1/1 on [dir 1 / [2,head] auth{1=2,2=1} v=1 cv=1/1 REP dir_auth=0 ap=0+2+2 state=1073741826|complete f(v0 1=0+1) n(v0 1=0+1) hs=1+6,ss=0+0 | child subtree replicated 0x29fd000] count now 0 + 2
> mds/Locker.cc: In function 'void Locker::simple_xlock(SimpleLock*)':
> mds/Locker.cc:3138: FAILED assert("shouldn't be called if we are already xlockable" == 0)
>  ceph version 0.22~rc (7657a6d5b30dd181350acf19681847d9c8f5d694)
>  1: (Locker::xlock_start(SimpleLock*, MDRequest*)+0x2ab) [0x5811ab]
>  2: (Locker::acquire_locks(MDRequest*, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&)+0x1749) [0x586a99]
>  3: (Server::handle_client_openc(MDRequest*)+0x407) [0x4dd737]
>  4: (Server::handle_client_request(MClientRequest*)+0x340) [0x4e2990]
>  5: (MDS::_dispatch(Message*)+0x2598) [0x49e038]
>  6: (MDS::ms_dispatch(Message*)+0x5b) [0x49e1ab]
>  7: (SimpleMessenger::dispatch_entry()+0x67a) [0x483f9a]
>  8: (SimpleMessenger::DispatchThread::entry()+0x4d) [0x47a4ed]
>  9: (Thread::_entry_func(void*)+0x7) [0x48dd17]
>  10: (()+0x68ba) [0x7f780553b8ba]
>  11: (clone()+0x6d) [0x7f78044ef02d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux