Re: mds damage detected - Jewel

John Spray <jspray@xxxxxxxxxx> · Mon, 19 Sep 2016 16:20:24 +0530

On Fri, Sep 16, 2016 at 7:15 PM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote:
> John,
>
>
>
> thanks for the tips.
>
>
>
> I ran a recursive long listing of the cephfs volume, and didn’t receive any errors. So I guess it wasn’t serious.

Interesting, so sounds like you hit a transient RADOS read issue.
Keep an eye out for similar events, and see if you can grab any OSD
logs from the time where it happens.

>
>
> I also tried running the following from
>
>
>
> ceph tell mds.0 damage ls
>
> 2016-09-16 07:11:36.824330 7fc2ff00e700  0 client.224234 ms_handle_reset on 192.168.19.243:6804/3448
>
> Error EPERM: problem getting command descriptions from mds.0
>
>
>
> I’m guessing this output means something didn’t work. Any other suggestions with this command?

If you are running on a node that doesn't have administrative keys
(e.g. your MDS) then it will give you EPERM -- in this case go run it
on somewhere you have an admin key, like usually your mons.

It may also give you EPERM if you installed this cluster with an older
version of Ceph and your client.admin key has "allow" MDS auth caps
instead of "allow *" (in this case you can update it from the CLI as
per http://docs.ceph.com/docs/master/rados/operations/user-management/)

John

>
>
> Thanks,
>
> Jim
>
>
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
>
>
> From: John Spray<mailto:jspray@xxxxxxxxxx>
> Sent: Friday, September 16, 2016 2:37 AM
> To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  mds damage detected - Jewel
>
>
>
> On Thu, Sep 15, 2016 at 10:30 PM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote:
>> I have a replicated cache pool and metadata pool which reside on ssd drives, with a size of 2, backed by a erasure coded data pool
>> The cephfs filesystem was in a healthy state. I pulled an SSD drive, to perform an exercise in osd failure.
>>
>> The cluster recognized the ssd failure, and replicated back to a healthy state, but I got a message saying the mds0 Metadata damage detected.
>>
>>
>>    cluster 62ed97d6-adf4-12e4-8fd5-3d9701b22b86
>>      health HEALTH_ERR
>>             mds0: Metadata damage detected
>>             mds0: Client master01.div18.swri.org failing to respond to cache pressure
>>      monmap e2: 3 mons at {ceph01=192.168.19.241:6789/0,ceph02=192.168.19.242:6789/0,ceph03=192.168.19.243:6789/0}
>>             election epoch 24, quorum 0,1,2 ceph01,darkjedi-ceph02,darkjedi-ceph03
>>       fsmap e25: 1/1/1 up {0=-ceph04=up:active}, 1 up:standby
>>      osdmap e1327: 20 osds: 20 up, 20 in
>>             flags sortbitwise
>>       pgmap v11630: 1536 pgs, 3 pools, 100896 MB data, 442 kobjects
>>             201 GB used, 62915 GB / 63116 GB avail
>>                 1536 active+clean
>>
>> In the mds logs of the active mds, I see the following:
>>
>> 7fad0c4b2700  0 -- 192.168.19.244:6821/17777 >> 192.168.19.243:6805/5090 pipe(0x7fad25885400 sd=56 :33513 s=1 pgs=0 cs=0 l=1 c=0x7fad2585f980).fault
>> 7fad14add700  0 mds.beacon.darkjedi-ceph04 handle_mds_beacon no longer laggy
>> 7fad101d3700  0 mds.0.cache.dir(10000016c08) _fetched missing object for [dir 10000016c08 /usr/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x7fad25ced500]
>> 7fad101d3700 -1 log_channel(cluster) log [ERR] : dir 10000016c08 object missing on disk; some files may be lost
>> 7fad0f9d2700  0 -- 192.168.19.244:6821/17777 >> 192.168.19.242:6800/3746 pipe(0x7fad25a4e800 sd=42 :0 s=1 pgs=0 cs=0 l=1 c=0x7fad25bd5180).fault
>> 7fad14add700 -1 log_channel(cluster) log [ERR] : unmatched fragstat size on single dirfrag 10000016c08, inode has f(v0 m2016-09-14 14:00:36.654244 13=1+12), dirfrag has f(v0 m2016-09-14 14:00:36.654244 1=0+1)
>> 7fad14add700 -1 log_channel(cluster) log [ERR] : unmatched rstat rbytes on single dirfrag 10000016c08, inode has n(v77 rc2016-09-14 14:00:36.654244 b1533163206 48173=43133+5040), dirfrag has n(v77 rc2016-09-14 14:00:36.654244 1=0+1)
>> 7fad101d3700 -1 log_channel(cluster) log [ERR] : unmatched rstat on 10000016c08, inode has n(v78 rc2016-09-14 14:00:36.656244 2=0+2), dirfrags have n(v0 rc2016-09-14 14:00:36.656244 3=0+3)
>>
>> I’m not sure why the metadata got damaged, since its being replicated, but I want to fix the issue, and test again. However, I cant figure out the steps to repair the metadata.
>
> Losing an object like that is almost certainly a sign that you've hit
> a bug -- probably an OSD bug if it was the OSDs being disrupted while
> the MDS daemons continued to run.
>
> The subsequent "unmatched fragstat" etc messages are probably a red
> herring where the stats are only bad because the object is missing,
> not because of some other issue (http://tracker.ceph.com/issues/17284)
>
>> I saw something about running a damage ls, but I can’t seem to find a more detailed repair document. Any pointers to get the metadata fixed? Seems both my mds daemons are running correctly, but that error bothers me. Shouldn’t happen I think.
>
> You can get the detail on what's damaged with "ceph tell mds.<id>
> damage ls" -- this spits out JSON that you may well want to parse with
> a tiny python script.
>
>>
>> I tried the following command, but it doesn’t understand it….
>> ceph --admin-daemon /var/run/ceph/ceph-mds. ceph03.asok damage ls
>>
>>
>> I then rebooted all 4 ceph servers simultaneously (another stress test), and the ceph cluster came back up healthy, and the mds damaged status has been cleared!!  I  then replaced the ssd, put it back into service, and let the backfill complete. The cluster was fully healthy. I pulled another ssd, and repeated this process, yet I never got the damaged mds messages. Was this just a random metadata damage due to yanking a drive out? Is there any lingering affects of the metadata that I need to address?
>
> The MDS damage table is an ephemeral structure, so when you reboot it
> will forget about the damage.  I would expect that doing a "ls -R" on
> your filsystem will cause the damage to be detected again as it
> traverses the filesystem, although if it doesn't then that would be a
> sign that the "missing object" was actually a bug failing to find it
> one time, rather than a bug where the object has really been lost.
>
> John
>
>>
>>
>> -          Jim
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com