Re: Ceph MDS replaying journal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for sending the logs so quickly.

626 2014-03-18 00:58:01.009623 7fba5cbbe700 10 mds.0.journal
EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7235981 prealloc
[1000041df86~1] used     1000041db9e
627 2014-03-18 00:58:01.009627 7fba5cbbe700 20 mds.0.journal  (session
prealloc [10000373451~3e8])
628 2014-03-18 00:58:01.010696 7fba5cbbe700 -1 mds/journal.cc: In
function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)'
thread 7fba5cbbe    700 time 2014-03-18 00:58:01.009644

The first line indicates that the version of SessionMap loaded from
disk is 7235981 while the version updated in the journal is 8632368.
The difference is much larger than one would expect, as we are only a
few events into the journal at the point of the failure.  The
assertion is checking that the inode claimed by the journal is in the
range allocated to the client session, and it is failing because the
stale sessionmap version is in use.

In version 0.72.2, there was a bug in the MDS that caused failures to
write the SessionMap object to disk to be ignored.  This could result
in a situation where there is an inconsistency between the contents of
the log and the contents of the SessionMap object.  A check was added
to avoid this in the latest code (b0dce8a0)

In a future release we will be adding tools for repairing damaged
systems in cases like this, but at the moment your options are quite
limited.
 * If the data is replaceable then you might simply use "ceph mds
newfs" to start from scratch.
 * If you can cope with losing some of the most recent modifications
but keeping most of the filesystem, you could try the experimental
journal reset function:
     ceph-mds -i mon0 -d  --reset-journal 0
   This is destructive: it will discard any metadata updates that have
been written to the journal but not to the backing store.  However, it
is less destructive than newfs.  It may crash when it completes, look
for output like this at the beginning before any stack trace to
indicate success:
   writing journal head
   writing EResetJournal entry
   done

We are looking forward to making the MDS and associated tools more
resilient ahead of making the filesystem a fully supported part of
ceph.

John

On Mon, Mar 17, 2014 at 5:09 PM, Luke Jing Yuan <jyluke@xxxxxxxx> wrote:
> Hi John,
>
> Thanks for responding to our issues, attached is the ceph.log file as per request. As for the ceph-mds.log, I will have to send it in 3 parts later due to our SMTP server's policy.
>
> Regards,
> Luke
>
> -----Original Message-----
> From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of John Spray
> Sent: Tuesday, 18 March, 2014 12:57 AM
> To: Wong Ming Tat
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Ceph MDS replaying journal
>
> Clarification: in step 1, stop the MDS service on *all* MDS servers (I notice there are standby daemons in the "ceph status" output).
>
> John
>
> On Mon, Mar 17, 2014 at 4:45 PM, John Spray <john.spray@xxxxxxxxxxx> wrote:
>> Hello,
>>
>> To understand what's gone wrong here, we'll need to increase the
>> verbosity of the logging from the MDS service and then trying starting
>> it again.
>>
>> 1. Stop the MDS service (on ubuntu this would be "stop ceph-mds-all")
>> 2. Move your old log file away so that we will have a fresh one mv
>> /var/log/ceph/ceph-mds.mon01.log /var/log/ceph/ceph-mds.mon01.log.old
>> 3. Start the mds service manually (so that it just tries once instead
>> of flapping):
>> ceph-mds -i mon01 -f --debug-mds=20 --debug-journaler=10
>>
>> The resulting log file may be quite big so you may want to gzip it
>> before sending it to the list.
>>
>> In addition to the MDS log, please attach your cluster log
>> (/var/log/ceph/ceph.log).
>>
>> Thanks,
>> John
>>
>> On Mon, Mar 17, 2014 at 7:02 AM, Wong Ming Tat <mt.wong@xxxxxxxx> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I receive the MDS replaying journal error as below.
>>>
>>> Hope anyone can give some information to solve this problem.
>>>
>>>
>>>
>>> # ceph health detail
>>>
>>> HEALTH_WARN mds cluster is degraded
>>>
>>> mds cluster is degraded
>>>
>>> mds.mon01 at x.x.x.x:6800/26426 rank 0 is replaying journal
>>>
>>>
>>>
>>> # ceph -s
>>>
>>>     cluster xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>>
>>>      health HEALTH_WARN mds cluster is degraded
>>>
>>>      monmap e1: 3 mons at
>>> {mon01=x.x.x.x:6789/0,mon02=x.x.x.y:6789/0,mon03=x.x.x.z:6789/0},
>>> election epoch 1210, quorum 0,1,2 mon01,mon02,mon03
>>>
>>>      mdsmap e17020: 1/1/1 up {0=mon01=up:replay}, 2 up:standby
>>>
>>>      osdmap e20195: 24 osds: 24 up, 24 in
>>>
>>>       pgmap v1424671: 3300 pgs, 6 pools, 793 GB data, 3284 kobjects
>>>
>>>             1611 GB used, 87636 GB / 89248 GB avail
>>>
>>>                 3300 active+clean
>>>
>>>   client io 2750 kB/s rd, 0 op/s
>>>
>>>
>>>
>>> # cat /var/log/ceph/ceph-mds.mon01.log
>>>
>>> 2014-03-16 18:40:41.894404 7f0f2875c700  0 mds.0.server
>>> handle_client_file_setlock: start: 0, length: 0, client: 324186, pid:
>>> 30684,
>>> pid_ns: 18446612141968944256, type: 4
>>>
>>>
>>>
>>> 2014-03-16 18:49:09.993985 7f0f24645700  0 -- x.x.x.x:6801/3739 >>
>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0
>>> c=0x100adc6e0).accept peer addr is really y.y.y.y:0/1662262473
>>> (socket is
>>> y.y.y.y:33592/0)
>>>
>>> 2014-03-16 18:49:10.000197 7f0f24645700  0 -- x.x.x.x:6801/3739 >>
>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0
>>> c=0x100adc6e0).accept connect_seq 0 vs existing 1 state standby
>>>
>>> 2014-03-16 18:49:10.000239 7f0f24645700  0 -- x.x.x.x:6801/3739 >>
>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0
>>> c=0x100adc6e0).accept peer reset, then tried to connect to us,
>>> replacing
>>>
>>> 2014-03-16 18:49:10.550726 7f4c34671780  0 ceph version 0.72.2
>>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid
>>> 13282
>>>
>>> 2014-03-16 18:49:10.826713 7f4c2f6f8700  1 mds.-1.0 handle_mds_map
>>> standby
>>>
>>> 2014-03-16 18:49:10.984992 7f4c2f6f8700  1 mds.0.14 handle_mds_map i
>>> am now
>>> mds.0.14
>>>
>>> 2014-03-16 18:49:10.985010 7f4c2f6f8700  1 mds.0.14 handle_mds_map
>>> state change up:standby --> up:replay
>>>
>>> 2014-03-16 18:49:10.985017 7f4c2f6f8700  1 mds.0.14 replay_start
>>>
>>> 2014-03-16 18:49:10.985024 7f4c2f6f8700  1 mds.0.14  recovery set is
>>>
>>> 2014-03-16 18:49:10.985027 7f4c2f6f8700  1 mds.0.14  need osdmap
>>> epoch 3446, have 3445
>>>
>>> 2014-03-16 18:49:10.985030 7f4c2f6f8700  1 mds.0.14  waiting for
>>> osdmap 3446 (which blacklists prior instance)
>>>
>>> 2014-03-16 18:49:16.945500 7f4c2f6f8700  0 mds.0.cache creating
>>> system inode with ino:100
>>>
>>> 2014-03-16 18:49:16.945747 7f4c2f6f8700  0 mds.0.cache creating
>>> system inode with ino:1
>>>
>>> 2014-03-16 18:49:17.358681 7f4c2b5e1700 -1 mds/journal.cc: In
>>> function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)'
>>> thread 7f4c2b5e1700 time 2014-03-16 18:49:17.356336
>>>
>>> mds/journal.cc: 1316: FAILED assert(i == used_preallocated_ino)
>>>
>>>
>>>
>>> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>>
>>> 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7587)
>>> [0x5af5e7]
>>>
>>> 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea]
>>>
>>> 3: (MDLog::_replay_thread()+0x678) [0x79dbb8]
>>>
>>> 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded]
>>>
>>> 5: (()+0x7e9a) [0x7f4c33a96e9a]
>>>
>>> 6: (clone()+0x6d) [0x7f4c3298b3fd]
>>>
>>>
>>>
>>> Regards,
>>>
>>> Wong Ming Tat
>>>
>>>
>>>
>>>
>>> ________________________________
>>> DISCLAIMER:
>>>
>>> This e-mail (including any attachments) is for the addressee(s) only
>>> and may be confidential, especially as regards personal data. If you
>>> are not the intended recipient, please note that any dealing, review,
>>> distribution, printing, copying or use of this e-mail is strictly
>>> prohibited. If you have received this email in error, please notify
>>> the sender immediately and delete the original message (including any attachments).
>>>
>>>
>>> MIMOS Berhad is a research and development institution under the
>>> purview of the Malaysian Ministry of Science, Technology and
>>> Innovation. Opinions, conclusions and other information in this
>>> e-mail that do not relate to the official business of MIMOS Berhad
>>> and/or its subsidiaries shall be understood as neither given nor
>>> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS
>>> Berhad nor its subsidiaries accepts responsibility for the same. All
>>> liability arising from or in connection with computer viruses and/or
>>> corrupted e-mails is excluded to the fullest extent permitted by law.
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
> DISCLAIMER:
>
>
> This e-mail (including any attachments) is for the addressee(s) only and may be confidential, especially as regards personal data. If you are not the intended recipient, please note that any dealing, review, distribution, printing, copying or use of this e-mail is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original message (including any attachments).
>
>
> MIMOS Berhad is a research and development institution under the purview of the Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions and other information in this e-mail that do not relate to the official business of MIMOS Berhad and/or its subsidiaries shall be understood as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. All liability arising from or in connection with computer viruses and/or corrupted e-mails is excluded to the fullest extent permitted by law.
>
>
> ------------------------------------------------------------------
> -
> -
> DISCLAIMER:
>
> This e-mail (including any attachments) is for the addressee(s)
> only and may contain confidential information. If you are not the
> intended recipient, please note that any dealing, review,
> distribution, printing, copying or use of this e-mail is strictly
> prohibited. If you have received this email in error, please notify
> the sender  immediately and delete the original message.
> MIMOS Berhad is a research and development institution under
> the purview of the Malaysian Ministry of Science, Technology and
> Innovation. Opinions, conclusions and other information in this e-
> mail that do not relate to the official business of MIMOS Berhad
> and/or its subsidiaries shall be understood as neither given nor
> endorsed by MIMOS Berhad and/or its subsidiaries and neither
> MIMOS Berhad nor its subsidiaries accepts responsibility for the
> same. All liability arising from or in connection with computer
> viruses and/or corrupted e-mails is excluded to the fullest extent
> permitted by law.
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux