Re: MDS_DAMAGE dir_frag

Dhairya Parmar <dparmar@xxxxxxxxxx> · Mon, 12 Dec 2022 19:52:38 +0530

Hi there,

You might want to look at [1] for this, also I found a relevant thread [2]
that could be helpful.

[1]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
[2] https://www.spinics.net/lists/ceph-users/msg53202.html

- Dhairya

On Mon, Dec 12, 2022 at 7:10 PM Sascha Lucas <ceph-users@xxxxxxxxx> wrote:

> Hi,
>
> without any outage/disaster cephFS (17.2.5/cephadm) reports damaged
> metadata:
>
> [root@ceph106 ~]# zcat
> /var/log/ceph/3cacfa58-55cf-11ed-abaf-5cba2c03dec0/ceph-mds.disklib.ceph106.kbzjbg.log-20221211.gz
> 2022-12-10T10:12:35.161+0000 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 958 from mon.1
> 2022-12-10T10:12:50.974+0000 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 959 from mon.1
> 2022-12-10T15:18:36.609+0000 7fa461791700  0
> mds.0.cache.dir(0x100001516b1) _fetched missing object for [dir
> 0x100001516b1
> /volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505/
> [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n()
> hs=0+0,ss=0+0 | waiter=1 authpin=1 0x56541d3c5a80]
> 2022-12-10T15:18:36.615+0000 7fa461791700 -1 log_channel(cluster) log
> [ERR] : dir 0x100001516b1 object missing on disk; some files may be lost
> (/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505)
> 2022-12-10T15:18:40.010+0000 7fa46779d700  1 mds.disklib.ceph106.kbzjbg
> Updating MDS map to version 960 from mon.1
> 2022-12-11T02:32:01.474+0000 7fa468fa0700 -1 received  signal: Hangup from
> Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
> UID: 0
>
> [root@ceph101 ~]# ceph tell mds.disklib:0 damage ls
> 2022-12-12T10:20:42.484+0100 7fa9e37fe700  0 client.165258 ms_handle_reset
> on v2:xxx.xxx.xxx.xxx:6800/519677707
> 2022-12-12T10:20:42.504+0100 7fa9e37fe700  0 client.165264 ms_handle_reset
> on v2:xxx.xxx.xxx.xxx:6800/519677707
> [
>      {
>          "damage_type": "dir_frag",
>          "id": 2085830739,
>          "ino": 1099513009841,
>          "frag": "*",
>          "path":
> "/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505"
>      }
> ]
>
> The mentioned path CV_MAGNETIC/V_7770505 is not visible, but I can't
> tell whether this is due to being lost, or removed by the application
> using the cephFS.
>
> Data is on EC4+2 pool, ROOT and METADATA are on replica=3 pools.
>
> Questions are: What happened? And how to fix the problem?
>
> Is running "ceph tell mds.disklib:0 scrub start /what/path?
> recursive,repair" the right thing? Is this a safe command? How is the
> impact on production?
>
> Can the file-system stay mounted/used by clients? How long will it take
> for 340T? What is a dir_frag damage?
>
> TIA, Sascha.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx