Re: mds isn't working anymore after osd's running full

John Spray <john.spray@xxxxxxxxxx> · Thu, 16 Oct 2014 11:23:27 +0100



Following up: firefly fix for undump is: https://github.com/ceph/ceph/pull/2734

Jasper: if you still need to try undumping on this existing firefly
cluster, then you can download ceph-mds packages from this
wip-firefly-undump branch from
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/

Cheers,
John

On Wed, Oct 15, 2014 at 8:15 PM, John Spray <john.spray@xxxxxxxxxx> wrote:
> Sadly undump has been broken for quite some time (it was fixed in
> giant as part of creating cephfs-journal-tool).  If there's a one line
> fix for this then it's probably worth putting in firefly since it's a
> long term supported branch -- I'll do that now.
>
> John
>
> On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero
> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>> Hello Greg,
>>
>> The dump and reset of the journal was succesful:
>>
>> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --dump-journal 0 journaldumptgho-mon001
>> journal is 9483323613~134215459
>> read 134213311 bytes at offset 9483323613
>> wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
>> NOTE: this is a _sparse_ file; you can
>>         $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
>>       to efficiently compress it while preserving sparseness.
>>
>> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0
>> old journal was 9483323613~134215459
>> new journal start will be 9621733376 (4194304 bytes past old end)
>> writing journal head
>> writing EResetJournal entry
>> done
>>
>>
>> Undumping the journal was not successful and looking into the error "client_lock.is_locked()" is showed several times. The mds is not running when I start the undumping so maybe have forgot something?
>>
>> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001
>> undump journaldumptgho-mon001
>> start 9483323613 len 134213311
>> writing header 200.00000000
>> osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
>> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x80f15e]
>>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  3: (main()+0x1632) [0x569c62]
>>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  5: /usr/bin/ceph-mds() [0x567d99]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
>> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>>
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x80f15e]
>>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  3: (main()+0x1632) [0x569c62]
>>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  5: /usr/bin/ceph-mds() [0x567d99]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>>      0> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
>> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>>
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
>> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x80f15e]
>>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  3: (main()+0x1632) [0x569c62]
>>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  5: /usr/bin/ceph-mds() [0x567d99]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> *** Caught signal (Aborted) **
>>  in thread 7fec3e5ad7a0
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x82ef61]
>>  2: (()+0xf710) [0x7fec3d9a6710]
>>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>>  4: (abort()+0x175) [0x7fec3ca7de15]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>>  6: (()+0xbcbe6) [0x7fec3d334be6]
>>  7: (()+0xbcc13) [0x7fec3d334c13]
>>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f2) [0x94b812]
>>  10: /usr/bin/ceph-mds() [0x80f15e]
>>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  12: (main()+0x1632) [0x569c62]
>>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  14: /usr/bin/ceph-mds() [0x567d99]
>> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) **
>>  in thread 7fec3e5ad7a0
>>
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x82ef61]
>>  2: (()+0xf710) [0x7fec3d9a6710]
>>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>>  4: (abort()+0x175) [0x7fec3ca7de15]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>>  6: (()+0xbcbe6) [0x7fec3d334be6]
>>  7: (()+0xbcc13) [0x7fec3d334c13]
>>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f2) [0x94b812]
>>  10: /usr/bin/ceph-mds() [0x80f15e]
>>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  12: (main()+0x1632) [0x569c62]
>>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  14: /usr/bin/ceph-mds() [0x567d99]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>>      0> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) **
>>  in thread 7fec3e5ad7a0
>>
>>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>  1: /usr/bin/ceph-mds() [0x82ef61]
>>  2: (()+0xf710) [0x7fec3d9a6710]
>>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>>  4: (abort()+0x175) [0x7fec3ca7de15]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>>  6: (()+0xbcbe6) [0x7fec3d334be6]
>>  7: (()+0xbcc13) [0x7fec3d334c13]
>>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f2) [0x94b812]
>>  10: /usr/bin/ceph-mds() [0x80f15e]
>>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>>  12: (main()+0x1632) [0x569c62]
>>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>>  14: /usr/bin/ceph-mds() [0x567d99]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> Aborted
>>
>> Jasper
>> ________________________________________
>> Van: Gregory Farnum [greg@xxxxxxxxxxx]
>> Verzonden: dinsdag 14 oktober 2014 23:40
>> Aan: Jasper Siero
>> CC: ceph-users
>> Onderwerp: Re:  mds isn't working anymore after osd's running full
>>
>> ceph-mds --undump-journal <rank> <journal-file>
>> Looks like it accidentally (or on purpose? you can break things with
>> it) got left out of the help text.
>>
>> On Tue, Oct 14, 2014 at 8:19 AM, Jasper Siero
>> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>>> Hello Greg,
>>>
>>> I dumped the journal successful to a file:
>>>
>>> journal is 9483323613~134215459
>>> read 134213311 bytes at offset 9483323613
>>> wrote 134213311 bytes at offset 9483323613 to journaldumptgho
>>> NOTE: this is a _sparse_ file; you can
>>>         $ tar cSzf journaldumptgho.tgz journaldumptgho
>>>       to efficiently compress it while preserving sparseness.
>>>
>>> I see the option for resetting the mds journal but I can't find the option for undumping /importing the journal:
>>>
>>>  usage: ceph-mds -i name [flags] [[--journal_check rank]|[--hot-standby][rank]]
>>>   -m monitorip:port
>>>         connect to monitor at given address
>>>   --debug_mds n
>>>         debug MDS level (e.g. 10)
>>>   --dump-journal rank filename
>>>         dump the MDS journal (binary) for rank.
>>>   --dump-journal-entries rank filename
>>>         dump the MDS journal (JSON) for rank.
>>>   --journal-check rank
>>>         replay the journal for rank, then exit
>>>   --hot-standby rank
>>>         start up as a hot standby for rank
>>>   --reset-journal rank
>>>         discard the MDS journal for rank, and replace it with a single
>>>         event that updates/resets inotable and sessionmap on replay.
>>>
>>> Do you know how to "undump" the journal back into ceph?
>>>
>>> Jasper
>>>
>>> ________________________________________
>>> Van: Gregory Farnum [greg@xxxxxxxxxxx]
>>> Verzonden: vrijdag 10 oktober 2014 23:45
>>> Aan: Jasper Siero
>>> CC: ceph-users
>>> Onderwerp: Re:  mds isn't working anymore after osd's running full
>>>
>>> Ugh, "debug journaler", not "debug journaled."
>>>
>>> That said, the filer output tells me that you're missing an object out
>>> of the MDS log. (200.000008f5) I think this issue should be resolved
>>> if you "dump" the journal to a file, "reset" it, and then "undump" it.
>>> (These are commands you can invoke from ceph-mds.)
>>> I haven't done this myself in a long time, so there may be some hard
>>> edges around it. In particular, I'm not sure if the dumped journal
>>> file will stop when the data stops, or if it will be a little too
>>> long. If so, we can fix that by truncating the dumped file to the
>>> proper length and resetting and undumping again.
>>> (And just to harp on it, this journal manipulation is a lot simpler in
>>> Giant... ;) )
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>> On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero
>>> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>>>> Hello Greg,
>>>>
>>>> No problem thanks for looking into the log. I attached the log to this email.
>>>> I'm looking forward for the new release because it would be nice to have more possibilities to diagnose problems.
>>>>
>>>> Kind regards,
>>>>
>>>> Jasper Siero
>>>> ________________________________________
>>>> Van: Gregory Farnum [greg@xxxxxxxxxxx]
>>>> Verzonden: dinsdag 7 oktober 2014 19:45
>>>> Aan: Jasper Siero
>>>> CC: ceph-users
>>>> Onderwerp: Re:  mds isn't working anymore after osd's running full
>>>>
>>>> Sorry; I guess this fell off my radar.
>>>>
>>>> The issue here is not that it's waiting for an osdmap; it got the
>>>> requested map and went into replay mode almost immediately. In fact
>>>> the log looks good except that it seems to finish replaying the log
>>>> and then simply fail to transition into active. Generate a new one,
>>>> adding in "debug journaled = 20" and "debug filer = 20", and we can
>>>> probably figure out how to fix it.
>>>> (This diagnosis is much easier in the upcoming Giant!)
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>> On Tue, Oct 7, 2014 at 7:55 AM, Jasper Siero
>>>> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>>>>> Hello Gregory,
>>>>>
>>>>> We still have the same problems with our test ceph cluster and didn't receive a reply from you after I send you the requested log files. Do you know if it's possible to get our cephfs filesystem working again or is it better to give up the files on cephfs and start over again?
>>>>>
>>>>> We restarted the cluster serveral times but it's still degraded:
>>>>> [root@th1-mon001 ~]# ceph -w
>>>>>     cluster c78209f5-55ea-4c70-8968-2231d2b05560
>>>>>      health HEALTH_WARN mds cluster is degraded
>>>>>      monmap e3: 3 mons at {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0}, election epoch 432, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
>>>>>      mdsmap e190: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
>>>>>      osdmap e2248: 12 osds: 12 up, 12 in
>>>>>       pgmap v197548: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
>>>>>             124 GB used, 175 GB / 299 GB avail
>>>>>                  491 active+clean
>>>>>                    1 active+clean+scrubbing+deep
>>>>>
>>>>> One placement group stays in the deep scrubbing fase.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Jasper Siero
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> Van: Jasper Siero
>>>>> Verzonden: donderdag 21 augustus 2014 16:43
>>>>> Aan: Gregory Farnum
>>>>> Onderwerp: RE:  mds isn't working anymore after osd's running full
>>>>>
>>>>> I did restart it but you are right about the epoch number which has changed but the situation looks the same.
>>>>> 2014-08-21 16:33:06.032366 7f9b5f3cd700  1 mds.0.27  need osdmap epoch 1994, have 1993
>>>>> 2014-08-21 16:33:06.032368 7f9b5f3cd700  1 mds.0.27  waiting for osdmap 1994 (which blacklists
>>>>> prior instance)
>>>>> I started the mds with the debug options and attached the log.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jasper
>>>>> ________________________________________
>>>>> Van: Gregory Farnum [greg@xxxxxxxxxxx]
>>>>> Verzonden: woensdag 20 augustus 2014 18:38
>>>>> Aan: Jasper Siero
>>>>> CC: ceph-users@xxxxxxxxxxxxxx
>>>>> Onderwerp: Re:  mds isn't working anymore after osd's running full
>>>>>
>>>>> After restarting your MDS, it still says it has epoch 1832 and needs
>>>>> epoch 1833? I think you didn't really restart it.
>>>>> If the epoch numbers have changed, can you restart it with "debug mds
>>>>> = 20", "debug objecter = 20", "debug ms = 1" in the ceph.conf and post
>>>>> the resulting log file somewhere?
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>
>>>>>
>>>>> On Wed, Aug 20, 2014 at 12:49 AM, Jasper Siero
>>>>> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>>>>>> Unfortunately that doesn't help. I restarted both the active and standby mds but that doesn't change the state of the mds. Is there a way to force the mds to look at the 1832 epoch (or earlier) instead of 1833 (need osdmap epoch 1833, have 1832)?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jasper
>>>>>> ________________________________________
>>>>>> Van: Gregory Farnum [greg@xxxxxxxxxxx]
>>>>>> Verzonden: dinsdag 19 augustus 2014 19:49
>>>>>> Aan: Jasper Siero
>>>>>> CC: ceph-users@xxxxxxxxxxxxxx
>>>>>> Onderwerp: Re:  mds isn't working anymore after osd's running full
>>>>>>
>>>>>> On Mon, Aug 18, 2014 at 6:56 AM, Jasper Siero
>>>>>> <jasper.siero@xxxxxxxxxxxxxxxxx> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We have a small ceph cluster running version 0.80.1 with cephfs on five
>>>>>>> nodes.
>>>>>>> Last week some osd's were full and shut itself down. To help de osd's start
>>>>>>> again I added some extra osd's and moved some placement group directories on
>>>>>>> the full osd's (which has a copy on another osd) to another place on the
>>>>>>> node (as mentioned in
>>>>>>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
>>>>>>> After clearing some space on the full osd's I started them again. After a
>>>>>>> lot of deep scrubbing and two pg inconsistencies which needed to be repaired
>>>>>>> everything looked fine except the mds which still is in the replay state and
>>>>>>> it stays that way.
>>>>>>> The log below says that mds need osdmap epoch 1833 and have 1832.
>>>>>>>
>>>>>>> 2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map standby
>>>>>>> 2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am now
>>>>>>> mds.0.25
>>>>>>> 2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state
>>>>>>> change up:standby --> up:replay
>>>>>>> 2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
>>>>>>> 2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
>>>>>>> 2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 1833,
>>>>>>> have 1832
>>>>>>> 2014-08-18 12:29:22.274017 7fa786182700  1 mds.0.25  waiting for osdmap 1833
>>>>>>> (which blacklists prior instance)
>>>>>>>
>>>>>>>  # ceph status
>>>>>>>     cluster c78209f5-55ea-4c70-8968-2231d2b05560
>>>>>>>      health HEALTH_WARN mds cluster is degraded
>>>>>>>      monmap e3: 3 mons at
>>>>>>> {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
>>>>>>> election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
>>>>>>>      mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
>>>>>>>      osdmap e1951: 12 osds: 12 up, 12 in
>>>>>>>       pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
>>>>>>>             124 GB used, 175 GB / 299 GB avail
>>>>>>>                  492 active+clean
>>>>>>>
>>>>>>> # ceph osd tree
>>>>>>> # id    weight    type name    up/down    reweight
>>>>>>> -1    0.2399    root default
>>>>>>> -2    0.05997        host th1-osd001
>>>>>>> 0    0.01999            osd.0    up    1
>>>>>>> 1    0.01999            osd.1    up    1
>>>>>>> 2    0.01999            osd.2    up    1
>>>>>>> -3    0.05997        host th1-osd002
>>>>>>> 3    0.01999            osd.3    up    1
>>>>>>> 4    0.01999            osd.4    up    1
>>>>>>> 5    0.01999            osd.5    up    1
>>>>>>> -4    0.05997        host th1-mon003
>>>>>>> 6    0.01999            osd.6    up    1
>>>>>>> 7    0.01999            osd.7    up    1
>>>>>>> 8    0.01999            osd.8    up    1
>>>>>>> -5    0.05997        host th1-mon002
>>>>>>> 9    0.01999            osd.9    up    1
>>>>>>> 10    0.01999            osd.10    up    1
>>>>>>> 11    0.01999            osd.11    up    1
>>>>>>>
>>>>>>> What is the way to get the mds up and running again?
>>>>>>>
>>>>>>> I still have all the placement group directories which I moved from the full
>>>>>>> osds which where down to create disk space.
>>>>>>
>>>>>> Try just restarting the MDS daemon. This sounds a little familiar so I
>>>>>> think it's a known bug which may be fixed in a later dev or point
>>>>>> release on the MDS, but it's a soft-state rather than a disk state
>>>>>> issue.
>>>>>> -Greg
>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com