Clarification: in step 1, stop the MDS service on *all* MDS servers (I notice there are standby daemons in the "ceph status" output). John On Mon, Mar 17, 2014 at 4:45 PM, John Spray <john.spray@xxxxxxxxxxx> wrote: > Hello, > > To understand what's gone wrong here, we'll need to increase the > verbosity of the logging from the MDS service and then trying starting > it again. > > 1. Stop the MDS service (on ubuntu this would be "stop ceph-mds-all") > 2. Move your old log file away so that we will have a fresh one > mv /var/log/ceph/ceph-mds.mon01.log /var/log/ceph/ceph-mds.mon01.log.old > 3. Start the mds service manually (so that it just tries once instead > of flapping): > ceph-mds -i mon01 -f --debug-mds=20 --debug-journaler=10 > > The resulting log file may be quite big so you may want to gzip it > before sending it to the list. > > In addition to the MDS log, please attach your cluster log > (/var/log/ceph/ceph.log). > > Thanks, > John > > On Mon, Mar 17, 2014 at 7:02 AM, Wong Ming Tat <mt.wong@xxxxxxxx> wrote: >> Hi, >> >> >> >> I receive the MDS replaying journal error as below. >> >> Hope anyone can give some information to solve this problem. >> >> >> >> # ceph health detail >> >> HEALTH_WARN mds cluster is degraded >> >> mds cluster is degraded >> >> mds.mon01 at x.x.x.x:6800/26426 rank 0 is replaying journal >> >> >> >> # ceph -s >> >> cluster xxxxxxxxxxxxxxxxxxxxxxxxxxxxx >> >> health HEALTH_WARN mds cluster is degraded >> >> monmap e1: 3 mons at >> {mon01=x.x.x.x:6789/0,mon02=x.x.x.y:6789/0,mon03=x.x.x.z:6789/0}, election >> epoch 1210, quorum 0,1,2 mon01,mon02,mon03 >> >> mdsmap e17020: 1/1/1 up {0=mon01=up:replay}, 2 up:standby >> >> osdmap e20195: 24 osds: 24 up, 24 in >> >> pgmap v1424671: 3300 pgs, 6 pools, 793 GB data, 3284 kobjects >> >> 1611 GB used, 87636 GB / 89248 GB avail >> >> 3300 active+clean >> >> client io 2750 kB/s rd, 0 op/s >> >> >> >> # cat /var/log/ceph/ceph-mds.mon01.log >> >> 2014-03-16 18:40:41.894404 7f0f2875c700 0 mds.0.server >> handle_client_file_setlock: start: 0, length: 0, client: 324186, pid: 30684, >> pid_ns: 18446612141968944256, type: 4 >> >> >> >> 2014-03-16 18:49:09.993985 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >> c=0x100adc6e0).accept peer addr is really y.y.y.y:0/1662262473 (socket is >> y.y.y.y:33592/0) >> >> 2014-03-16 18:49:10.000197 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >> c=0x100adc6e0).accept connect_seq 0 vs existing 1 state standby >> >> 2014-03-16 18:49:10.000239 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >> c=0x100adc6e0).accept peer reset, then tried to connect to us, replacing >> >> 2014-03-16 18:49:10.550726 7f4c34671780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 13282 >> >> 2014-03-16 18:49:10.826713 7f4c2f6f8700 1 mds.-1.0 handle_mds_map standby >> >> 2014-03-16 18:49:10.984992 7f4c2f6f8700 1 mds.0.14 handle_mds_map i am now >> mds.0.14 >> >> 2014-03-16 18:49:10.985010 7f4c2f6f8700 1 mds.0.14 handle_mds_map state >> change up:standby --> up:replay >> >> 2014-03-16 18:49:10.985017 7f4c2f6f8700 1 mds.0.14 replay_start >> >> 2014-03-16 18:49:10.985024 7f4c2f6f8700 1 mds.0.14 recovery set is >> >> 2014-03-16 18:49:10.985027 7f4c2f6f8700 1 mds.0.14 need osdmap epoch 3446, >> have 3445 >> >> 2014-03-16 18:49:10.985030 7f4c2f6f8700 1 mds.0.14 waiting for osdmap 3446 >> (which blacklists prior instance) >> >> 2014-03-16 18:49:16.945500 7f4c2f6f8700 0 mds.0.cache creating system inode >> with ino:100 >> >> 2014-03-16 18:49:16.945747 7f4c2f6f8700 0 mds.0.cache creating system inode >> with ino:1 >> >> 2014-03-16 18:49:17.358681 7f4c2b5e1700 -1 mds/journal.cc: In function 'void >> EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7f4c2b5e1700 >> time 2014-03-16 18:49:17.356336 >> >> mds/journal.cc: 1316: FAILED assert(i == used_preallocated_ino) >> >> >> >> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) >> >> 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7587) [0x5af5e7] >> >> 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea] >> >> 3: (MDLog::_replay_thread()+0x678) [0x79dbb8] >> >> 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded] >> >> 5: (()+0x7e9a) [0x7f4c33a96e9a] >> >> 6: (clone()+0x6d) [0x7f4c3298b3fd] >> >> >> >> Regards, >> >> Wong Ming Tat >> >> >> >> >> ________________________________ >> DISCLAIMER: >> >> This e-mail (including any attachments) is for the addressee(s) only and may >> be confidential, especially as regards personal data. If you are not the >> intended recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you have >> received this email in error, please notify the sender immediately and >> delete the original message (including any attachments). >> >> >> MIMOS Berhad is a research and development institution under the purview of >> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >> conclusions and other information in this e-mail that do not relate to the >> official business of MIMOS Berhad and/or its subsidiaries shall be >> understood as neither given nor endorsed by MIMOS Berhad and/or its >> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >> responsibility for the same. All liability arising from or in connection >> with computer viruses and/or corrupted e-mails is excluded to the fullest >> extent permitted by law. >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com