Crashed MDS not starting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi
I'm running ceph v56.3 over debian-wheezy, with the ceph.com debs.
My setup is three servers with 6 disk each. I have 5 disks on each server dedicated to osd's and the disk left is dedicated to the monitors (three, one for each server) and the mds's (three, one for each server, only one active at a time).
We are using cephfs from another host, mounting it with the kernel driver.

We are downloading data from ~150 servers with rsync every night. We try to have 50 simultaneous rsync processes. All of this are running on the cephfs exported filesystem.
The directory where we are downloading all data are on a pool configured with min_size=2, so we have at least 2 copies for every object.

Yesterday we were doing our downloads and the mds crashed. The other mds's tried to start and then crashed also. This morning I had some issues with some stuck inactive pgs and I have resolved it, but the mds don't want to start. When I try to start it with "service ceph start mds 10" I have this message on the logfile:
[...]
    -3> 2013-03-06 11:24:02.304950 7f41c5afb700 10 mds.0.journal EMetaBlob.replay inotable tablev 4296440 <= table 4296754
    -2> 2013-03-06 11:24:02.304952 7f41c5afb700 10 mds.0.journal EMetaBlob.replay sessionmap v8546402 -(1|2) == table 8412050 prealloc [100004192ca~1] used 10000418ee2
    -1> 2013-03-06 11:24:02.304956 7f41c5afb700 20 mds.0.journal  (session prealloc [1000040887b~3e8])
     0> 2013-03-06 11:24:02.306239 7f41c5afb700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)' thread 7f41c5afb700 time 2013-03-06 11:24:02.304977
mds/journal.cc: 744: FAILED assert(i == used_preallocated_ino)

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x6bd8) [0x520a78]
 2: (EUpdate::replay(MDS*)+0x38) [0x523da8]
 3: (MDLog::_replay_thread()+0x5cf) [0x6d1eaf]
 4: (MDLog::ReplayThread::entry()+0xd) [0x50458d]
 5: (()+0x6b50) [0x7f41ce2e6b50]
 6: (clone()+0x6d) [0x7f41ccc96a7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[...]

I have searched the assert problem (I think this is the problem, some  problem allocating inodes) but I didn't found anything.

By now we don't have access to the filesystem. What can I do to start mds again?

Thanks in advance.
--
Félix Ortega Hortigüela
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux