Re: Stuck MDSs behind in trimming

Zachary Ulissi <zulissi@xxxxxxxxx> · Thu, 8 Jul 2021 08:18:27 -0400

After some more digging, all three MDS enter state up:rejoin but don't move
on from there when restarting.

Also, MDS 0 (not the one with a trimming problem) consistently has
mds.0.cache  failed to open ino 0x101 err -116/0
mds.0.cache  failed to open ino 0x102 err -116/0

in the log when restarting.

On Thu, Jul 8, 2021 at 6:29 AM Zachary Ulissi <zulissi@xxxxxxxxx> wrote:

> We're running a rook-ceph cluster that has gotten stuck in "1 MDSs behind
> on trimming".
>
> * 1 filesystem, three active MDS servers each with standby
> * Quite a few files (20M objects), daily snapshots. This might be a
> problem?
> * Ceph pacific 16.2.4
>
> * `ceph health detail` doesn't provide much help (see below)
> * num_segments is very slowly increasing over time
> * Restarting all of the MDSs returns to the same point.
> * moderate CPU usage for each MDS server (~30% for the stuck one, ~80% of
> a core for the others)
> * logs for the stuck MDS looks clean, it hits rejoin_joint_start then
> standard 'updating MDS map to version XXX" messages
> * `ceph daemon mds.x ops` shows no active ops on each of the MDS servers
> * `mds_log_max_segments` is set to 128, setting to a higher number causes
> the warning to go away, but the filesystem remains degraded, and setting it
> back to 128 shows num_segments has not changed.
> * I've tried playing around with other MDS settings based on various posts
> on this list and elsewhere, to no avail
> * `cephfs-journal-tool journal inspect` for each rank says journal
> integrity is fine.
>
> Something similar happened last week and (probably by accident by
> removing/adding nodes?) I got the MDSs to start recovering and the
> filesystem went back to healthy.
>
> I'm at a bit of a loss for what else to try.
>
> Thanks!
> Zack
>
>
> `ceph health detail`
> HEALTH_WARN mons are allowing insecure global_id reclaim; 1 filesystem is
> degraded; 1 MDSs behind on trimming; mon x is low on available space
> [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure
> global_id reclaim
>     mon.x has auth_allow_insecure_global_id_reclaim set to true
>     mon.ad has auth_allow_insecure_global_id_reclaim set to true
>     mon.af has auth_allow_insecure_global_id_reclaim set to true
> [WRN] FS_DEGRADED: 1 filesystem is degraded
>     fs myfs is degraded
> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>     mds.myfs-d(mds.2): Behind on trimming (340/128) max_segments: 128,
> num_segments: 340
> [WRN] MON_DISK_LOW: mon x is low on available space
>     mon.x has 22% avail
>
> `ceph config get mds`
> WHO     MASK  LEVEL     OPTION                              VALUE        RO
> global        basic     log_file                                         *
> global        basic     log_to_file                         false
> mds           basic     mds_cache_memory_limit              17179869184
> mds           advanced  mds_cache_trim_decay_rate           1.000000
> mds           advanced  mds_cache_trim_threshold            1048576
> mds           advanced  mds_log_max_segments                128
> mds           advanced  mds_recall_max_caps                 5000
> mds           advanced  mds_recall_max_decay_rate           2.500000
> global        advanced  mon_allow_pool_delete               true
> global        advanced  mon_allow_pool_size_one             true
> global        advanced  mon_cluster_log_file
> global        advanced  mon_pg_warn_min_per_osd             0
> global        advanced  osd_pool_default_pg_autoscale_mode  on
> global        advanced  osd_scrub_auto_repair               true
> global        advanced  rbd_default_features                3
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx