Thanks for sending the logs so quickly. 626 2014-03-18 00:58:01.009623 7fba5cbbe700 10 mds.0.journal EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7235981 prealloc [1000041df86~1] used 1000041db9e 627 2014-03-18 00:58:01.009627 7fba5cbbe700 20 mds.0.journal (session prealloc [10000373451~3e8]) 628 2014-03-18 00:58:01.010696 7fba5cbbe700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7fba5cbbe 700 time 2014-03-18 00:58:01.009644 The first line indicates that the version of SessionMap loaded from disk is 7235981 while the version updated in the journal is 8632368. The difference is much larger than one would expect, as we are only a few events into the journal at the point of the failure. The assertion is checking that the inode claimed by the journal is in the range allocated to the client session, and it is failing because the stale sessionmap version is in use. In version 0.72.2, there was a bug in the MDS that caused failures to write the SessionMap object to disk to be ignored. This could result in a situation where there is an inconsistency between the contents of the log and the contents of the SessionMap object. A check was added to avoid this in the latest code (b0dce8a0) In a future release we will be adding tools for repairing damaged systems in cases like this, but at the moment your options are quite limited. * If the data is replaceable then you might simply use "ceph mds newfs" to start from scratch. * If you can cope with losing some of the most recent modifications but keeping most of the filesystem, you could try the experimental journal reset function: ceph-mds -i mon0 -d --reset-journal 0 This is destructive: it will discard any metadata updates that have been written to the journal but not to the backing store. However, it is less destructive than newfs. It may crash when it completes, look for output like this at the beginning before any stack trace to indicate success: writing journal head writing EResetJournal entry done We are looking forward to making the MDS and associated tools more resilient ahead of making the filesystem a fully supported part of ceph. John On Mon, Mar 17, 2014 at 5:09 PM, Luke Jing Yuan <jyluke@xxxxxxxx> wrote: > Hi John, > > Thanks for responding to our issues, attached is the ceph.log file as per request. As for the ceph-mds.log, I will have to send it in 3 parts later due to our SMTP server's policy. > > Regards, > Luke > > -----Original Message----- > From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of John Spray > Sent: Tuesday, 18 March, 2014 12:57 AM > To: Wong Ming Tat > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Ceph MDS replaying journal > > Clarification: in step 1, stop the MDS service on *all* MDS servers (I notice there are standby daemons in the "ceph status" output). > > John > > On Mon, Mar 17, 2014 at 4:45 PM, John Spray <john.spray@xxxxxxxxxxx> wrote: >> Hello, >> >> To understand what's gone wrong here, we'll need to increase the >> verbosity of the logging from the MDS service and then trying starting >> it again. >> >> 1. Stop the MDS service (on ubuntu this would be "stop ceph-mds-all") >> 2. Move your old log file away so that we will have a fresh one mv >> /var/log/ceph/ceph-mds.mon01.log /var/log/ceph/ceph-mds.mon01.log.old >> 3. Start the mds service manually (so that it just tries once instead >> of flapping): >> ceph-mds -i mon01 -f --debug-mds=20 --debug-journaler=10 >> >> The resulting log file may be quite big so you may want to gzip it >> before sending it to the list. >> >> In addition to the MDS log, please attach your cluster log >> (/var/log/ceph/ceph.log). >> >> Thanks, >> John >> >> On Mon, Mar 17, 2014 at 7:02 AM, Wong Ming Tat <mt.wong@xxxxxxxx> wrote: >>> Hi, >>> >>> >>> >>> I receive the MDS replaying journal error as below. >>> >>> Hope anyone can give some information to solve this problem. >>> >>> >>> >>> # ceph health detail >>> >>> HEALTH_WARN mds cluster is degraded >>> >>> mds cluster is degraded >>> >>> mds.mon01 at x.x.x.x:6800/26426 rank 0 is replaying journal >>> >>> >>> >>> # ceph -s >>> >>> cluster xxxxxxxxxxxxxxxxxxxxxxxxxxxxx >>> >>> health HEALTH_WARN mds cluster is degraded >>> >>> monmap e1: 3 mons at >>> {mon01=x.x.x.x:6789/0,mon02=x.x.x.y:6789/0,mon03=x.x.x.z:6789/0}, >>> election epoch 1210, quorum 0,1,2 mon01,mon02,mon03 >>> >>> mdsmap e17020: 1/1/1 up {0=mon01=up:replay}, 2 up:standby >>> >>> osdmap e20195: 24 osds: 24 up, 24 in >>> >>> pgmap v1424671: 3300 pgs, 6 pools, 793 GB data, 3284 kobjects >>> >>> 1611 GB used, 87636 GB / 89248 GB avail >>> >>> 3300 active+clean >>> >>> client io 2750 kB/s rd, 0 op/s >>> >>> >>> >>> # cat /var/log/ceph/ceph-mds.mon01.log >>> >>> 2014-03-16 18:40:41.894404 7f0f2875c700 0 mds.0.server >>> handle_client_file_setlock: start: 0, length: 0, client: 324186, pid: >>> 30684, >>> pid_ns: 18446612141968944256, type: 4 >>> >>> >>> >>> 2014-03-16 18:49:09.993985 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>> c=0x100adc6e0).accept peer addr is really y.y.y.y:0/1662262473 >>> (socket is >>> y.y.y.y:33592/0) >>> >>> 2014-03-16 18:49:10.000197 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>> c=0x100adc6e0).accept connect_seq 0 vs existing 1 state standby >>> >>> 2014-03-16 18:49:10.000239 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>> c=0x100adc6e0).accept peer reset, then tried to connect to us, >>> replacing >>> >>> 2014-03-16 18:49:10.550726 7f4c34671780 0 ceph version 0.72.2 >>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid >>> 13282 >>> >>> 2014-03-16 18:49:10.826713 7f4c2f6f8700 1 mds.-1.0 handle_mds_map >>> standby >>> >>> 2014-03-16 18:49:10.984992 7f4c2f6f8700 1 mds.0.14 handle_mds_map i >>> am now >>> mds.0.14 >>> >>> 2014-03-16 18:49:10.985010 7f4c2f6f8700 1 mds.0.14 handle_mds_map >>> state change up:standby --> up:replay >>> >>> 2014-03-16 18:49:10.985017 7f4c2f6f8700 1 mds.0.14 replay_start >>> >>> 2014-03-16 18:49:10.985024 7f4c2f6f8700 1 mds.0.14 recovery set is >>> >>> 2014-03-16 18:49:10.985027 7f4c2f6f8700 1 mds.0.14 need osdmap >>> epoch 3446, have 3445 >>> >>> 2014-03-16 18:49:10.985030 7f4c2f6f8700 1 mds.0.14 waiting for >>> osdmap 3446 (which blacklists prior instance) >>> >>> 2014-03-16 18:49:16.945500 7f4c2f6f8700 0 mds.0.cache creating >>> system inode with ino:100 >>> >>> 2014-03-16 18:49:16.945747 7f4c2f6f8700 0 mds.0.cache creating >>> system inode with ino:1 >>> >>> 2014-03-16 18:49:17.358681 7f4c2b5e1700 -1 mds/journal.cc: In >>> function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' >>> thread 7f4c2b5e1700 time 2014-03-16 18:49:17.356336 >>> >>> mds/journal.cc: 1316: FAILED assert(i == used_preallocated_ino) >>> >>> >>> >>> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) >>> >>> 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7587) >>> [0x5af5e7] >>> >>> 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea] >>> >>> 3: (MDLog::_replay_thread()+0x678) [0x79dbb8] >>> >>> 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded] >>> >>> 5: (()+0x7e9a) [0x7f4c33a96e9a] >>> >>> 6: (clone()+0x6d) [0x7f4c3298b3fd] >>> >>> >>> >>> Regards, >>> >>> Wong Ming Tat >>> >>> >>> >>> >>> ________________________________ >>> DISCLAIMER: >>> >>> This e-mail (including any attachments) is for the addressee(s) only >>> and may be confidential, especially as regards personal data. If you >>> are not the intended recipient, please note that any dealing, review, >>> distribution, printing, copying or use of this e-mail is strictly >>> prohibited. If you have received this email in error, please notify >>> the sender immediately and delete the original message (including any attachments). >>> >>> >>> MIMOS Berhad is a research and development institution under the >>> purview of the Malaysian Ministry of Science, Technology and >>> Innovation. Opinions, conclusions and other information in this >>> e-mail that do not relate to the official business of MIMOS Berhad >>> and/or its subsidiaries shall be understood as neither given nor >>> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >>> Berhad nor its subsidiaries accepts responsibility for the same. All >>> liability arising from or in connection with computer viruses and/or >>> corrupted e-mails is excluded to the fullest extent permitted by law. >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ________________________________ > DISCLAIMER: > > > This e-mail (including any attachments) is for the addressee(s) only and may be confidential, especially as regards personal data. If you are not the intended recipient, please note that any dealing, review, distribution, printing, copying or use of this e-mail is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original message (including any attachments). > > > MIMOS Berhad is a research and development institution under the purview of the Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions and other information in this e-mail that do not relate to the official business of MIMOS Berhad and/or its subsidiaries shall be understood as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. All liability arising from or in connection with computer viruses and/or corrupted e-mails is excluded to the fullest extent permitted by law. > > > ------------------------------------------------------------------ > - > - > DISCLAIMER: > > This e-mail (including any attachments) is for the addressee(s) > only and may contain confidential information. If you are not the > intended recipient, please note that any dealing, review, > distribution, printing, copying or use of this e-mail is strictly > prohibited. If you have received this email in error, please notify > the sender immediately and delete the original message. > MIMOS Berhad is a research and development institution under > the purview of the Malaysian Ministry of Science, Technology and > Innovation. Opinions, conclusions and other information in this e- > mail that do not relate to the official business of MIMOS Berhad > and/or its subsidiaries shall be understood as neither given nor > endorsed by MIMOS Berhad and/or its subsidiaries and neither > MIMOS Berhad nor its subsidiaries accepts responsibility for the > same. All liability arising from or in connection with computer > viruses and/or corrupted e-mails is excluded to the fullest extent > permitted by law. > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com