Hi,
without any outage/disaster cephFS (17.2.5/cephadm) reports damaged
metadata:
[root@ceph106 ~]# zcat /var/log/ceph/3cacfa58-55cf-11ed-abaf-5cba2c03dec0/ceph-mds.disklib.ceph106.kbzjbg.log-20221211.gz
2022-12-10T10:12:35.161+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg Updating MDS map to version 958 from mon.1
2022-12-10T10:12:50.974+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg Updating MDS map to version 959 from mon.1
2022-12-10T15:18:36.609+0000 7fa461791700 0 mds.0.cache.dir(0x100001516b1) _fetched missing object for [dir 0x100001516b1 /volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x56541d3c5a80]
2022-12-10T15:18:36.615+0000 7fa461791700 -1 log_channel(cluster) log [ERR] : dir 0x100001516b1 object missing on disk; some files may be lost (/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505)
2022-12-10T15:18:40.010+0000 7fa46779d700 1 mds.disklib.ceph106.kbzjbg Updating MDS map to version 960 from mon.1
2022-12-11T02:32:01.474+0000 7fa468fa0700 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
[root@ceph101 ~]# ceph tell mds.disklib:0 damage ls
2022-12-12T10:20:42.484+0100 7fa9e37fe700 0 client.165258 ms_handle_reset on v2:xxx.xxx.xxx.xxx:6800/519677707
2022-12-12T10:20:42.504+0100 7fa9e37fe700 0 client.165264 ms_handle_reset on v2:xxx.xxx.xxx.xxx:6800/519677707
[
{
"damage_type": "dir_frag",
"id": 2085830739,
"ino": 1099513009841,
"frag": "*",
"path": "/volumes/_nogroup/ec-pool4p2/aa36abb9-a22e-405f-921c-76152599c6ba/LQ1WYG_10.28.2022_04.50/CV_MAGNETIC/V_7770505"
}
]
The mentioned path CV_MAGNETIC/V_7770505 is not visible, but I can't
tell whether this is due to being lost, or removed by the application
using the cephFS.
Data is on EC4+2 pool, ROOT and METADATA are on replica=3 pools.
Questions are: What happened? And how to fix the problem?
Is running "ceph tell mds.disklib:0 scrub start /what/path?
recursive,repair" the right thing? Is this a safe command? How is the
impact on production?
Can the file-system stay mounted/used by clients? How long will it take
for 340T? What is a dir_frag damage?
TIA, Sascha.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx