Time Estimation for cephfs-data-scan scan_links

"Odair M." <omdjunior@xxxxxxxxxxx> · Thu, 12 Oct 2023 15:27:14 -0300

Hello,

I've encountered an issue where the metadata pool has corrupted a cache 
inode, leading to an MDS rank abort in the 'reconnect' state. To address 
this, I'm following the "USING AN ALTERNATE METADATA POOL FOR RECOVERY" 
section from the documentation [1].

However, I've observed that the cephfs-data-scan scan_links step has 
been running for over 24 hours on 35 TB of data, which is replicated 
across 3 OSDs, resulting in more than 100 TB of raw data. Does anyone 
have an estimation on the duration for this step?

Additional detail: The corrupted mds log:

    -9> 2023-10-11T10:13:22.254-0300 7ff901f75700 10 monclient: 
get_auth_request con 0x559bf41e4400 auth_method 0
    -8> 2023-10-11T10:13:22.254-0300 7ff8ff770700  5 mds.barril12 
handle_mds_map old map epoch 472481 <= 472481, discarding
    -7> 2023-10-11T10:13:22.254-0300 7ff8ff770700  0 mds.0.cache  
missing dir for * (which maps to *) on [inode 0x10021afaf90 
[...392,head] /dbteamvenv/ auth v98534854 snaprealm=0x559bf427ce00 f(v60 
m2023-10-06T15:35:03.278089-0300 9=0+9) n(v141971 
rc2023-10-09T18:41:19.742089-0300 b1424948533453 139810=131460+8350) 
(iversion lock) 0x559bf4298580]
    -6> 2023-10-11T10:13:22.254-0300 7ff8ff770700  0 mds.0.cache  
missing dir ino 0x20005dd786b
    -5> 2023-10-11T10:13:22.254-0300 7ff902776700 10 monclient: 
get_auth_request con 0x559bf4142c00 auth_method 0
    -4> 2023-10-11T10:13:22.258-0300 7ff902f77700  5 mds.beacon.barril12 
received beacon reply up:rejoin seq 4 rtt 1.09601
    -3> 2023-10-11T10:13:22.258-0300 7ff8ff770700 -1 
./src/mds/MDCache.cc: In function 'void 
MDCache::handle_cache_rejoin_weak(ceph::cref_t<MMDSCacheRejoin>&)' 
thread 7ff8ff770700 time 2023-10-11T10:13:22.259535-0300
./src/mds/MDCache.cc: 4462: FAILED ceph_assert(diri)

 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x124) [0x7ff904a5b282]
 2: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420]
 3: 
(MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin 
const> const&)+0x20de) [0x559bf0a9da6e]
 4: (MDCache::dispatch(boost::intrusive_ptr<Message const> 
const&)+0x424) [0x559bf0aa2a64]
 5: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, 
bool)+0x5c0) [0x559bf0930580]
 6: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> 
const&)+0x58) [0x559bf0930b78]
 7: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> 
const&)+0x1bf) [0x559bf090b5df]
 8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> 
const&)+0x468) [0x7ff904ca71d8]
 9: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd]
 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7]
 12: clone()

    -2> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient: 
get_auth_request con 0x559bf41e4c00 auth_method 0
    -1> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient: 
get_auth_request con 0x559bf41e5400 auth_method 0
     0> 2023-10-11T10:13:22.262-0300 7ff8ff770700 -1 *** Caught signal 
(Aborted) **
 in thread 7ff8ff770700 thread_name:ms_dispatch

 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7ff90568c140]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x17e) [0x7ff904a5b2dc]
 5: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420]
 6: 
(MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin 
const> const&)+0x20de) [0x559bf0a9da6e]
 7: (MDCache::dispatch(boost::intrusive_ptr<Message const> 
const&)+0x424) [0x559bf0aa2a64]
 8: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, 
bool)+0x5c0) [0x559bf0930580]
 9: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> 
const&)+0x58) [0x559bf0930b78]
 10: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> 
const&)+0x1bf) [0x559bf090b5df]
 11: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> 
const&)+0x468) [0x7ff904ca71d8]
 12: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd]
 14: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7]
 15: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

Ceph Cluster status:

barril1:~# ceph status
  cluster:
    id:     c30ecc8d-440e-4608-b3fe-5020337ae11d
    health: HEALTH_ERR
            2 filesystems are degraded
            2 filesystems are offline

  services:
    mon: 5 daemons, quorum barril4,barril3,barril2,barril1,urquell (age 
32h)
    mgr: barril2(active, since 32h), standbys: barril3, barril4, 
urquell, barril1
    mds: 0/10 daemons up (10 failed), 9 standby
    osd: 48 osds: 48 up (since 32h), 48 in (since 2M); 22 remapped pgs
    rgw: 4 daemons active (4 hosts, 1 zones)

  data:
    volumes: 0/2 healthy, 2 failed
    pools:   12 pools, 1475 pgs
    objects: 50.89M objects, 72 TiB
    usage:   207 TiB used, 148 TiB / 355 TiB avail
    pgs:     579358/152674596 objects misplaced (0.379%)
             1449 active+clean
             22   active+remapped+backfilling
             4    active+clean+scrubbing+deep

  io:
    client:   7.2 MiB/s rd, 1.2 MiB/s wr, 342 op/s rd, 367 op/s wr
    recovery: 26 MiB/s, 13 keys/s, 26 objects/s

  progress:
    Global Recovery Event (19h)
      [===========================.] (remaining: 17m)

Ceph fs status:

barril1:~# ceph fs status
cephfs - 0 clients
======
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
 0    failed
 1    failed
 2    failed
 3    failed
 4    failed
 5    failed
 6    failed
 7    failed
 8    failed
      POOL          TYPE     USED  AVAIL
cephfs_metadata   metadata  1045G  35.6T
cephfs.c3sl.data    data     114T  35.6T
c3sl - 0 clients
====
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
 0    failed
      POOL          TYPE     USED  AVAIL
cephfs.c3sl.meta  metadata  28.2G  35.6T
cephfs.c3sl.data    data     114T  35.6T
STANDBY MDS
  barril2
  barril4
  barril42
  barril33
  barril13
  barril23
  barril43
  barril1
  barril12
MDS version: ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

ceph health detail:

barril1:~# ceph health detail
HEALTH_ERR 2 filesystems are degraded; 2 filesystems are offline
[WRN] FS_DEGRADED: 2 filesystems are degraded
    fs cephfs is degraded
    fs c3sl is degraded
[ERR] MDS_ALL_DOWN: 2 filesystems are offline
    fs cephfs is offline because no MDS is active for it.
    fs c3sl is offline because no MDS is active for it.

[1]: 
https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery

Best regards,

Odair M. Ditkun Jr
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx