Re: Time Estimation for cephfs-data-scan scan_links

Venky Shankar <vshankar@xxxxxxxxxx> · Fri, 13 Oct 2023 11:09:33 +0530

Hi Odair,

On Thu, Oct 12, 2023 at 11:58 PM Odair M. <omdjunior@xxxxxxxxxxx> wrote:
>
> Hello,
>
> I've encountered an issue where the metadata pool has corrupted a cache
> inode, leading to an MDS rank abort in the 'reconnect' state. To address
> this, I'm following the "USING AN ALTERNATE METADATA POOL FOR RECOVERY"
> section from the documentation [1].

Using an alternative metadata pool for recovery isn't really tested to
a great deal and that's mentioned as a warning in the document you
refer to.

>
> However, I've observed that the cephfs-data-scan scan_links step has
> been running for over 24 hours on 35 TB of data, which is replicated
> across 3 OSDs, resulting in more than 100 TB of raw data. Does anyone
> have an estimation on the duration for this step?

scan_links has to iterate through every object in the metadata pool
and for each object iterate over the omap key/values - so this step
scales to the amount of objects in the metadata pool, i.e., the number
of directories and files in the file system.

It's a bit hard to provide a time estimate but I think that's a
feature we would like to add for these tools.

>
> Additional detail: The corrupted mds log:
>
>      -9> 2023-10-11T10:13:22.254-0300 7ff901f75700 10 monclient:
> get_auth_request con 0x559bf41e4400 auth_method 0
>      -8> 2023-10-11T10:13:22.254-0300 7ff8ff770700  5 mds.barril12
> handle_mds_map old map epoch 472481 <= 472481, discarding
>      -7> 2023-10-11T10:13:22.254-0300 7ff8ff770700  0 mds.0.cache
> missing dir for * (which maps to *) on [inode 0x10021afaf90

I'm not sure what happened and since this is a recent ceph version
(17.2.6), we should put the details in a tracker to get an RCA of what
could have caused this.

> [...392,head] /dbteamvenv/ auth v98534854 snaprealm=0x559bf427ce00 f(v60
> m2023-10-06T15:35:03.278089-0300 9=0+9) n(v141971
> rc2023-10-09T18:41:19.742089-0300 b1424948533453 139810=131460+8350)
> (iversion lock) 0x559bf4298580]
>      -6> 2023-10-11T10:13:22.254-0300 7ff8ff770700  0 mds.0.cache
> missing dir ino 0x20005dd786b
>      -5> 2023-10-11T10:13:22.254-0300 7ff902776700 10 monclient:
> get_auth_request con 0x559bf4142c00 auth_method 0
>      -4> 2023-10-11T10:13:22.258-0300 7ff902f77700  5 mds.beacon.barril12
> received beacon reply up:rejoin seq 4 rtt 1.09601
>      -3> 2023-10-11T10:13:22.258-0300 7ff8ff770700 -1
> ./src/mds/MDCache.cc: In function 'void
> MDCache::handle_cache_rejoin_weak(ceph::cref_t<MMDSCacheRejoin>&)'
> thread 7ff8ff770700 time 2023-10-11T10:13:22.259535-0300
> ./src/mds/MDCache.cc: 4462: FAILED ceph_assert(diri)
>
>   ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x124) [0x7ff904a5b282]
>   2: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420]
>   3:
> (MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin
> const> const&)+0x20de) [0x559bf0a9da6e]
>   4: (MDCache::dispatch(boost::intrusive_ptr<Message const>
> const&)+0x424) [0x559bf0aa2a64]
>   5: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&,
> bool)+0x5c0) [0x559bf0930580]
>   6: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>
> const&)+0x58) [0x559bf0930b78]
>   7: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message>
> const&)+0x1bf) [0x559bf090b5df]
>   8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message>
> const&)+0x468) [0x7ff904ca71d8]
>   9: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df]
>   10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd]
>   11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7]
>   12: clone()
>
>      -2> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient:
> get_auth_request con 0x559bf41e4c00 auth_method 0
>      -1> 2023-10-11T10:13:22.258-0300 7ff902f77700 10 monclient:
> get_auth_request con 0x559bf41e5400 auth_method 0
>       0> 2023-10-11T10:13:22.262-0300 7ff8ff770700 -1 *** Caught signal
> (Aborted) **
>   in thread 7ff8ff770700 thread_name:ms_dispatch
>
>   ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> (stable)
>   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7ff90568c140]
>   2: gsignal()
>   3: abort()
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x17e) [0x7ff904a5b2dc]
>   5: /usr/lib/ceph/libceph-common.so.2(+0x25b420) [0x7ff904a5b420]
>   6:
> (MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin
> const> const&)+0x20de) [0x559bf0a9da6e]
>   7: (MDCache::dispatch(boost::intrusive_ptr<Message const>
> const&)+0x424) [0x559bf0aa2a64]
>   8: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&,
> bool)+0x5c0) [0x559bf0930580]
>   9: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>
> const&)+0x58) [0x559bf0930b78]
>   10: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message>
> const&)+0x1bf) [0x559bf090b5df]
>   11: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message>
> const&)+0x468) [0x7ff904ca71d8]
>   12: (DispatchQueue::entry()+0x5ef) [0x7ff904ca48df]
>   13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff904d681cd]
>   14: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7ff905680ea7]
>   15: clone()
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
>
> Ceph Cluster status:
>
> barril1:~# ceph status
>    cluster:
>      id:     c30ecc8d-440e-4608-b3fe-5020337ae11d
>      health: HEALTH_ERR
>              2 filesystems are degraded
>              2 filesystems are offline
>
>    services:
>      mon: 5 daemons, quorum barril4,barril3,barril2,barril1,urquell (age
> 32h)
>      mgr: barril2(active, since 32h), standbys: barril3, barril4,
> urquell, barril1
>      mds: 0/10 daemons up (10 failed), 9 standby
>      osd: 48 osds: 48 up (since 32h), 48 in (since 2M); 22 remapped pgs
>      rgw: 4 daemons active (4 hosts, 1 zones)
>
>    data:
>      volumes: 0/2 healthy, 2 failed
>      pools:   12 pools, 1475 pgs
>      objects: 50.89M objects, 72 TiB
>      usage:   207 TiB used, 148 TiB / 355 TiB avail
>      pgs:     579358/152674596 objects misplaced (0.379%)
>               1449 active+clean
>               22   active+remapped+backfilling
>               4    active+clean+scrubbing+deep
>
>    io:
>      client:   7.2 MiB/s rd, 1.2 MiB/s wr, 342 op/s rd, 367 op/s wr
>      recovery: 26 MiB/s, 13 keys/s, 26 objects/s
>
>    progress:
>      Global Recovery Event (19h)
>        [===========================.] (remaining: 17m)
>
>
>
> Ceph fs status:
>
> barril1:~# ceph fs status
> cephfs - 0 clients
> ======
> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
>   0    failed
>   1    failed
>   2    failed
>   3    failed
>   4    failed
>   5    failed
>   6    failed
>   7    failed
>   8    failed
>        POOL          TYPE     USED  AVAIL
> cephfs_metadata   metadata  1045G  35.6T
> cephfs.c3sl.data    data     114T  35.6T
> c3sl - 0 clients
> ====
> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
>   0    failed
>        POOL          TYPE     USED  AVAIL
> cephfs.c3sl.meta  metadata  28.2G  35.6T
> cephfs.c3sl.data    data     114T  35.6T
> STANDBY MDS
>    barril2
>    barril4
>    barril42
>    barril33
>    barril13
>    barril23
>    barril43
>    barril1
>    barril12
> MDS version: ceph version 17.2.6
> (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
>
> ceph health detail:
>
> barril1:~# ceph health detail
> HEALTH_ERR 2 filesystems are degraded; 2 filesystems are offline
> [WRN] FS_DEGRADED: 2 filesystems are degraded
>      fs cephfs is degraded
>      fs c3sl is degraded
> [ERR] MDS_ALL_DOWN: 2 filesystems are offline
>      fs cephfs is offline because no MDS is active for it.
>      fs c3sl is offline because no MDS is active for it.
>
>
> [1]:
> https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
>
> Best regards,
>
> Odair M. Ditkun Jr
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx