Hi there, You might want to look at [1] for this, also I found a relevant thread [2] that could be helpful. [1] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts [2] https://www.spinics.net/lists/ceph-users/msg53202.html - Dhairya On Mon, Dec 12, 2022 at 7:10 PM Sascha Lucas <ceph-users@xxxxxxxxx> wrote: > Hi, > > without any outage/disaster cephFS (17.2.5/cephadm) reports damaged > metadata: > > [root@ceph106 ~]# zcat > /var/log/ceph/3cacfa58-55cf-11ed-abaf-5cba2c03dec0/ceph-mds.disklib.ceph106.kbzjbg.log-20221211.gz > 2022-12-10T10:12:35.161+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg > Updating MDS map to version 958 from mon.1 > 2022-12-10T10:12:50.974+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg > Updating MDS map to version 959 from mon.1 > 2022-12-10T15:18:36.609+0000 7fa461791700 0 > mds.0.cache.dir(0x100001516b1) _fetched missing object for [dir > 0x100001516b1 > /volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505/ > [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n() > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x56541d3c5a80] > 2022-12-10T15:18:36.615+0000 7fa461791700 -1 log_channel(cluster) log > [ERR] : dir 0x100001516b1 object missing on disk; some files may be lost > (/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505) > 2022-12-10T15:18:40.010+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg > Updating MDS map to version 960 from mon.1 > 2022-12-11T02:32:01.474+0000 7fa468fa0700 -1 received signal: Hangup from > Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) > UID: 0 > > [root@ceph101 ~]# ceph tell mds.disklib:0 damage ls > 2022-12-12T10:20:42.484+0100 7fa9e37fe700 0 client.165258 ms_handle_reset > on v2:xxx.xxx.xxx.xxx:6800/519677707 > 2022-12-12T10:20:42.504+0100 7fa9e37fe700 0 client.165264 ms_handle_reset > on v2:xxx.xxx.xxx.xxx:6800/519677707 > [ > { > "damage_type": "dir_frag", > "id": 2085830739, > "ino": 1099513009841, > "frag": "*", > "path": > "/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505" > } > ] > > The mentioned path CV_MAGNETIC/V_7770505 is not visible, but I can't > tell whether this is due to being lost, or removed by the application > using the cephFS. > > Data is on EC4+2 pool, ROOT and METADATA are on replica=3 pools. > > Questions are: What happened? And how to fix the problem? > > Is running "ceph tell mds.disklib:0 scrub start /what/path? > recursive,repair" the right thing? Is this a safe command? How is the > impact on production? > > Can the file-system stay mounted/used by clients? How long will it take > for 340T? What is a dir_frag damage? > > TIA, Sascha. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx