Re: MDS damaged

John Spray <jspray@xxxxxxxxxx> · Thu, 26 Oct 2017 16:23:44 +0200

On Thu, Oct 26, 2017 at 12:40 PM, Daniel Davidson
<danield@xxxxxxxxxxxxxxxx> wrote:
> And at the risk of bombing the mailing list, I can also see that the
> stray7_head omapkey is not being recreated:
> rados -p igbhome_data listomapkeys 100.00000000
> stray0_head
> stray1_head
> stray2_head
> stray3_head
> stray4_head
> stray5_head
> stray6_head
> stray8_head
> stray9_head

So if it's staying up for a little while, I'd issue a "ceph daemon
mds.<a> flush journal" to ensure it writes back the recreated stray
directory.  It's normal that the newly created dir (and its linkage)
don't appear in the backing store right away (they're just created in
memory+journal at mds startup).

John

>
>
>
>
> On 10/26/2017 05:08 AM, Daniel Davidson wrote:
>>
>> I increased the logging of the mds to try and get some more information.
>> I think the relevant lines are:
>>
>> 2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) _fetched
>> missing object for [dir 607 ~mds0/stray7/ [2,head] auth v=108918871 cv=0/0
>> ap=1+0+0 state=1610645632 f(v1 m2017-10-25 14:56:13.140995 299=
>> 299+0) n(v1 rc2017-10-25 14:56:13.140995 b191590453903 299=299+0)
>> hs=0+11,ss=0+0 dirty=11 | child=1 sticky=1 dirty=1 waiter=1 authpin=1
>> 0x7f1c71e9f300]
>> 2017-10-26 05:03:17.661708 7f1c598a6700 -1 log_channel(cluster) log [ERR]
>> : dir 607 object missing on disk; some files may be lost (~mds0/stray7)
>> 2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag
>> Damage to fragment * of ino 607 is fatal because it is a system directory
>> for this rank
>>
>> I would be grateful for any help in repair,
>>
>> Dan
>>
>> On 10/25/2017 04:17 PM, Daniel Davidson wrote:
>>>
>>> A bit more news: I made the ceph-0 mds shut down, started the mds on
>>> ceph-1 and then told it the mds was repaired.  Everything ran great for
>>> about 5 hours and now it has crashed again.  Same error:
>>>
>>> 2017-10-25 15:13:07.344093 mon.0 [INF] fsmap e121828: 1/1/1 up
>>> {0=ceph-1=up:active}
>>> 2017-10-25 15:13:07.383445 mds.0 [ERR] dir 607 object missing on disk;
>>> some files may be lost (~mds0/stray7)
>>> 2017-10-25 15:13:07.480785 mon.0 [INF] osdmap e35296: 32 osds: 32 up, 32
>>> in
>>> 2017-10-25 15:13:07.530337 mon.0 [INF] pgmap v28449919: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 5894 kB/s
>>> rd, 26 op/s
>>> 2017-10-25 15:13:08.473363 mon.0 [INF] mds.0 172.16.31.2:6802/3109594408
>>> down:damaged
>>> 2017-10-25 15:13:08.473487 mon.0 [INF] fsmap e121829: 0/1/1 up, 1 damaged
>>>
>>> If I:
>>> #  rados -p igbhome_data rmomapkey 100.00000000 stray7_head
>>> # ceph mds repaired 0
>>>
>>> then I get:
>>> 2017-10-25 16:11:52.219916 mds.0 [ERR] dir 607 object missing on disk;
>>> some files may be lost (~mds0/stray7)
>>> 2017-10-25 16:11:52.307975 mon.0 [INF] osdmap e35322: 32 osds: 32 up, 32
>>> in
>>> 2017-10-25 16:11:52.357904 mon.0 [INF] pgmap v28450567: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 11773 kB/s
>>> rd, 5262 kB/s wr, 26 op/s
>>> 2017-10-25 16:11:53.325331 mon.0 [INF] mds.0 172.16.31.2:6802/2716803172
>>> down:damaged
>>> 2017-10-25 16:11:53.325424 mon.0 [INF] fsmap e121882: 0/1/1 up, 1 damaged
>>> 2017-10-25 16:11:53.475087 mon.0 [INF] pgmap v28450568: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 39677 kB/s
>>> rd, 47236 B/s wr, 54 op/s
>>> 2017-10-25 16:11:54.590232 mon.0 [INF] pgmap v28450569: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 28105 kB/s
>>> rd, 3786 kB/s wr, 43 op/s
>>> 2017-10-25 16:11:55.719476 mon.0 [INF] pgmap v28450570: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 26284 kB/s
>>> rd, 3678 kB/s wr, 357 op/s
>>> 2017-10-25 16:11:56.830623 mon.0 [INF] pgmap v28450571: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 37249 kB/s
>>> rd, 5476 B/s wr, 358 op/s
>>> 2017-10-25 16:11:57.965330 mon.0 [INF] pgmap v28450572: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 60769 kB/s
>>> rd, 53485 B/s wr, 41 op/s
>>> 2017-10-25 16:11:58.033787 mon.0 [INF] mds.? 172.16.31.2:6802/2725942008
>>> up:boot
>>> 2017-10-25 16:11:58.033876 mon.0 [INF] fsmap e121883: 0/1/1 up, 1
>>> up:standby, 1 damaged
>>>
>>> Dan
>>>
>>> On 10/25/2017 11:30 AM, Daniel Davidson wrote:
>>>>
>>>> The system is down again saying it is missing the same stray7 again.
>>>>
>>>> 2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for
>>>> missing inodes:
>>>> 2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6
>>>> 2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on disk;
>>>> some files may be lost (~mds0/stray7)
>>>>
>>>>
>>>> Dan
>>>>
>>>> On 10/25/2017 08:54 AM, Daniel Davidson wrote:
>>>>>
>>>>> Thanks for the information.
>>>>>
>>>>> I did:
>>>>> # ceph daemon mds.ceph-0 scrub_path / repair recursive
>>>>>
>>>>> Saw in the logs it finished
>>>>>
>>>>> # ceph daemon mds.ceph-0 flush journal
>>>>>
>>>>> Saw in the logs it finished
>>>>>
>>>>> #ceph mds fail 0
>>>>> #ceph mds repaired 0
>>>>>
>>>>> And it went back to missing stray7 again.  I added that back like we
>>>>> did earlier and the system is back on line again, but the metadata errors
>>>>> still exist.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 10/25/2017 07:50 AM, John Spray wrote:
>>>>>>
>>>>>> Commands that start "ceph daemon" take mds.<name> rather than a rank
>>>>>> (notes on terminology here:
>>>>>> http://docs.ceph.com/docs/master/cephfs/standby/).  The name is how
>>>>>> you would refer to the daemon from systemd, it's often set to the
>>>>>> hostname where the daemon is running by default.
>>>>>>
>>>>>> John
>>>>>>
>>>>>> On Wed, Oct 25, 2017 at 2:30 PM, Daniel Davidson
>>>>>> <danield@xxxxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> I do have a problem with running the commands you mentioned to repair
>>>>>>> the
>>>>>>> mds:
>>>>>>>
>>>>>>> # ceph daemon mds.0 scrub_path
>>>>>>> admin_socket: exception getting command descriptions: [Errno 2] No
>>>>>>> such file
>>>>>>> or directory
>>>>>>> admin_socket: exception getting command descriptions: [Errno 2] No
>>>>>>> such file
>>>>>>> or directory
>>>>>>>
>>>>>>> Any idea why that is not working?
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/25/2017 06:45 AM, Daniel Davidson wrote:
>>>>>>>>
>>>>>>>> John, thank you so much.  After doing the initial rados command you
>>>>>>>> mentioned it is back up and running.  It did complain about a bunch
>>>>>>>> of files
>>>>>>>> which frankly are not important having duplicate inodes, but I will
>>>>>>>> run
>>>>>>>> those repair and scrub commands you mentioned and get it back clean
>>>>>>>> again.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> On 10/25/2017 03:55 AM, John Spray wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Oct 24, 2017 at 7:14 PM, Daniel Davidson
>>>>>>>>> <danield@xxxxxxxxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> Our ceph system is having a problem.
>>>>>>>>>>
>>>>>>>>>> A few days a go we had a pg that was marked as inconsistent, and
>>>>>>>>>> today I
>>>>>>>>>> fixed it with a:
>>>>>>>>>>
>>>>>>>>>> #ceph pg repair 1.37c
>>>>>>>>>>
>>>>>>>>>> then a file was stuck as missing so I did a:
>>>>>>>>>>
>>>>>>>>>> #ceph pg 1.37c mark_unfound_lost delete
>>>>>>>>>> pg has 1 objects unfound and apparently lost marking
>>>>>>>>>
>>>>>>>>> OK, so "fixed" might be a bit of an overstatement here: while the
>>>>>>>>> PG
>>>>>>>>> is considered healthy, from CephFS's point of view what happened
>>>>>>>>> was
>>>>>>>>> that some of its metadata just got blown away.
>>>>>>>>>
>>>>>>>>> There are some (most) objects that CephFS can do without (it will
>>>>>>>>> just
>>>>>>>>> EIO when you try to read that file/dir), but there are some that
>>>>>>>>> are
>>>>>>>>> essential and will cause a whole MDS rank to be damaged
>>>>>>>>> (unstartable)
>>>>>>>>> -- that's what's happened in your case.
>>>>>>>>>
>>>>>>>>>> That fixed the unfound file problem and all the pgs went
>>>>>>>>>> active+clean.
>>>>>>>>>> A
>>>>>>>>>> few minutes later though, the FS seemed to pause and the MDS
>>>>>>>>>> started
>>>>>>>>>> giving
>>>>>>>>>> errors.
>>>>>>>>>>
>>>>>>>>>> # ceph -w
>>>>>>>>>>       cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77
>>>>>>>>>>        health HEALTH_ERR
>>>>>>>>>>               mds rank 0 is damaged
>>>>>>>>>>               mds cluster is degraded
>>>>>>>>>>               noscrub,nodeep-scrub flag(s) set
>>>>>>>>>>        monmap e3: 4 mons at
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> {ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0}
>>>>>>>>>>               election epoch 652, quorum 0,1,2,3
>>>>>>>>>> ceph-0,ceph-1,ceph-2,ceph-3
>>>>>>>>>>         fsmap e121409: 0/1/1 up, 4 up:standby, 1 damaged
>>>>>>>>>>        osdmap e35220: 32 osds: 32 up, 32 in
>>>>>>>>>>               flags
>>>>>>>>>> noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
>>>>>>>>>>         pgmap v28398840: 1536 pgs, 2 pools, 795 TB data, 329
>>>>>>>>>> Mobjects
>>>>>>>>>>               1595 TB used, 1024 TB / 2619 TB avail
>>>>>>>>>>                   1536 active+clean
>>>>>>>>>>
>>>>>>>>>> Looking into the logs when I try a:
>>>>>>>>>>
>>>>>>>>>> #ceph mds repaired 0
>>>>>>>>>>
>>>>>>>>>> 2017-10-24 12:01:27.354271 mds.0 172.16.31.3:6801/1949050374 75 :
>>>>>>>>>> cluster
>>>>>>>>>> [ERR] dir 607 object missing on disk; some files may be lost
>>>>>>>>>> (~mds0/stray7)
>>>>>>>>>>
>>>>>>>>>> Any ideas as for what to do next, I am stumped.
>>>>>>>>>
>>>>>>>>> So if this is really the only missing object, then it's your lucky
>>>>>>>>> day, you lost a stray directory which usually contain just deleted
>>>>>>>>> files (can contain something more important if you've had hardlinks
>>>>>>>>> where the original file was later deleted).
>>>>>>>>>
>>>>>>>>> The MDS goes damaged if it has a reference to a stay directory, but
>>>>>>>>> the directory object isn't found.  OTOH if there is no reference to
>>>>>>>>> the stray directory, it will happily recreate it for you. So, you
>>>>>>>>> can
>>>>>>>>> do this:
>>>>>>>>> rados -p <your metadata pool> rmomapkey 100.00000000 stray7_head
>>>>>>>>>
>>>>>>>>> ...to prompt the MDS to recreate the stray directory (the arguments
>>>>>>>>> there are the magic internal names for ~mds0/stray7).
>>>>>>>>>
>>>>>>>>> Then, if that was the only damage, your MDS will come up after you
>>>>>>>>> run
>>>>>>>>> "ceph mds repaired 0".
>>>>>>>>>
>>>>>>>>> There will still be some inconsistency resulting from removing the
>>>>>>>>> stray dir, and possibly also from the disaster recovery tools that
>>>>>>>>> you've run since, so you'll want to do a "ceph daemon mds.<id>
>>>>>>>>> scrub_path / repair recursive".  This will probably output a bunch
>>>>>>>>> of
>>>>>>>>> messages to the cluster log about things that it is repairing. Then
>>>>>>>>> do "ceph daemon mds.<id> flush journal" to flush out the repairs it
>>>>>>>>> has made, and restart the MDS daemon one more time ("ceph mds fail
>>>>>>>>> 0").
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>> Dan
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com