Hi Odair, On Thu, Oct 12, 2023 at 11:58 PM Odair M. <omdjunior@xxxxxxxxxxx> wrote: > > Hello, > > I've encountered an issue where the metadata pool has corrupted a cache > inode, leading to an MDS rank abort in the 'reconnect' state. To address > this, I'm following the "USING AN ALTERNATE METADATA POOL FOR RECOVERY" > section from the documentation [1]. Using an alternative metadata pool for recovery isn't really tested to a great deal and that's mentioned as a warning in the document you refer to. > > However, I've observed that the cephfs-data-scan scan_links step has > been running for over 24 hours on 35 TB of data, which is replicated > across 3 OSDs, resulting in more than 100 TB of raw data. Does anyone > have an estimation on the duration for this step? scan_links has to iterate through every object in the metadata pool and for each object iterate over the omap key/values - so this step scales to the amount of objects in the metadata pool, i.e., the number of directories and files in the file system. It's a bit hard to provide a time estimate but I think that's a feature we would like to add for these tools. > > Additional detail: The corrupted mds log: > > -9> 2023-10-11T10:13:22.254-0300 7ff901f75700 10 monclient: > get_auth_request con 0x559bf41e4400 auth_method 0 > -8> 2023-10-11T10:13:22.254-0300 7ff8ff770700 5 mds.barril12 > handle_mds_map old map epoch 472481 <= 472481, discarding > -7> 2023-10-11T10:13:22.254-0300 7ff8ff770700 0 mds.0.cache > missing dir for * (which maps to *) on [inode 0x10021afaf90 I'm not sure what happened and since this is a recent ceph version (17.2.6), we should put the details in a tracker to get an RCA of what could have caused this. > [...392,head] /dbteamvenv/ auth v98534854 snaprealm=0x559bf427ce00 f(v60 > m2023-10-06T15:35:03.278089-0300 9=0+9) n(v141971 > rc2023-10-09T18:41:19.742089-0300 b1424948533453 139810=131460+8350) > (iversion lock) 0x559bf4298580] > -6> 2023-10-11T10:13:22.254-0300 7ff8ff770700 0 mds.0.cache > missing dir ino 0x20005dd786b > -5> 2023-10-11T10:13:22.254-0300 7ff902776700 10 monclient: > get_auth_request con 0x559bf4142c00 auth_method 0 > -4> 2023-10-11T10:13:22.258-0300 7ff902f77700 5 mds.beacon.barril12 > received beacon reply up:rejoin seq 4 rtt 1.09601 > -3> 2023-10-11T10:13:22.258-0300 7ff8ff770700 -1 > ./src/mds/MDCache.cc: In function 'void > MDCache::handle_cache_rejoin_weak(ceph::cref_t<MMDSCacheRejoin>&)' > thread 7ff8ff770700 time 2023-10-11T10:13:22.259535-0300 > ./src/mds/MDCache.cc: 4462: FAILED ceph_assert(diri) > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x124) [0x7ff904a5b282] > 2: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420] > 3: > (MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin > const> const&)+0x20de) [0x559bf0a9da6e] > 4: (MDCache::dispatch(boost::intrusive_ptr<Message const> > const&)+0x424) [0x559bf0aa2a64] > 5: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, > bool)+0x5c0) [0x559bf0930580] > 6: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> > const&)+0x58) [0x559bf0930b78] > 7: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> > const&)+0x1bf) [0x559bf090b5df] > 8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> > const&)+0x468) [0x7ff904ca71d8] > 9: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df] > 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd] > 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7] > 12: clone() > > -2> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient: > get_auth_request con 0x559bf41e4c00 auth_method 0 > -1> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient: > get_auth_request con 0x559bf41e5400 auth_method 0 > 0> 2023-10-11T10:13:22.262-0300 7ff8ff770700 -1 *** Caught signal > (Aborted) ** > in thread 7ff8ff770700 thread_name:ms_dispatch > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7ff90568c140] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x17e) [0x7ff904a5b2dc] > 5: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420] > 6: > (MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin > const> const&)+0x20de) [0x559bf0a9da6e] > 7: (MDCache::dispatch(boost::intrusive_ptr<Message const> > const&)+0x424) [0x559bf0aa2a64] > 8: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, > bool)+0x5c0) [0x559bf0930580] > 9: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> > const&)+0x58) [0x559bf0930b78] > 10: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> > const&)+0x1bf) [0x559bf090b5df] > 11: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> > const&)+0x468) [0x7ff904ca71d8] > 12: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df] > 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd] > 14: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7] > 15: clone() > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > > Ceph Cluster status: > > barril1:~# ceph status > cluster: > id: c30ecc8d-440e-4608-b3fe-5020337ae11d > health: HEALTH_ERR > 2 filesystems are degraded > 2 filesystems are offline > > services: > mon: 5 daemons, quorum barril4,barril3,barril2,barril1,urquell (age > 32h) > mgr: barril2(active, since 32h), standbys: barril3, barril4, > urquell, barril1 > mds: 0/10 daemons up (10 failed), 9 standby > osd: 48 osds: 48 up (since 32h), 48 in (since 2M); 22 remapped pgs > rgw: 4 daemons active (4 hosts, 1 zones) > > data: > volumes: 0/2 healthy, 2 failed > pools: 12 pools, 1475 pgs > objects: 50.89M objects, 72 TiB > usage: 207 TiB used, 148 TiB / 355 TiB avail > pgs: 579358/152674596 objects misplaced (0.379%) > 1449 active+clean > 22 active+remapped+backfilling > 4 active+clean+scrubbing+deep > > io: > client: 7.2 MiB/s rd, 1.2 MiB/s wr, 342 op/s rd, 367 op/s wr > recovery: 26 MiB/s, 13 keys/s, 26 objects/s > > progress: > Global Recovery Event (19h) > [===========================.] (remaining: 17m) > > > > Ceph fs status: > > barril1:~# ceph fs status > cephfs - 0 clients > ====== > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 failed > 1 failed > 2 failed > 3 failed > 4 failed > 5 failed > 6 failed > 7 failed > 8 failed > POOL TYPE USED AVAIL > cephfs_metadata metadata 1045G 35.6T > cephfs.c3sl.data data 114T 35.6T > c3sl - 0 clients > ==== > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 failed > POOL TYPE USED AVAIL > cephfs.c3sl.meta metadata 28.2G 35.6T > cephfs.c3sl.data data 114T 35.6T > STANDBY MDS > barril2 > barril4 > barril42 > barril33 > barril13 > barril23 > barril43 > barril1 > barril12 > MDS version: ceph version 17.2.6 > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) > > ceph health detail: > > barril1:~# ceph health detail > HEALTH_ERR 2 filesystems are degraded; 2 filesystems are offline > [WRN] FS_DEGRADED: 2 filesystems are degraded > fs cephfs is degraded > fs c3sl is degraded > [ERR] MDS_ALL_DOWN: 2 filesystems are offline > fs cephfs is offline because no MDS is active for it. > fs c3sl is offline because no MDS is active for it. > > > [1]: > https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery > > Best regards, > > Odair M. Ditkun Jr > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx