Re: MDS Behind on Trimming...

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Mon, 8 Apr 2024 12:00:25 +0800

Hi Erich,

On Mon, Apr 8, 2024 at 11:51 AM Erich Weiler <weiler@xxxxxxxxxxxx> wrote:
>
> Hi Xiubo,
>
> > Thanks for your logs, and it should be the same issue with
> > https://tracker.ceph.com/issues/62052, could you try to test with this
> > fix again ?
>
> This sounds good - but I'm not clear on what I should do?  I see a patch
> in that tracker page, is that what you are referring to?  If so, how
> would I apply such a patch?  Or is there simply a binary update I can
> apply somehow to the MDS server software?

The backport of this patch (https://github.com/ceph/ceph/pull/53241)
was merged on October 18, 2023, and Ceph 18.2.1 was released on
December 18, 2023. Therefore, if you are running Ceph 18.2.1 on the
server side, you already have the fix. If you are already running
version 18.2.1 or 18.2.2 (to which you should upgrade anyway), please
complain, as the purported fix is then ineffective.

>
> Thanks for helping!
>
> -erich
>
> > Please let me know if you still could see this bug then it should be the
> > locker order bug as https://tracker.ceph.com/issues/62123.
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> > On 3/28/24 04:03, Erich Weiler wrote:
> >> Hi All,
> >>
> >> I've been battling this for a while and I'm not sure where to go from
> >> here.  I have a Ceph health warning as such:
> >>
> >> # ceph -s
> >>   cluster:
> >>     id:     58bde08a-d7ed-11ee-9098-506b4b4da440
> >>     health: HEALTH_WARN
> >>             1 MDSs report slow requests
> >>             1 MDSs behind on trimming
> >>
> >>   services:
> >>     mon: 5 daemons, quorum
> >> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
> >>     mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
> >>     mds: 1/1 daemons up, 2 standby
> >>     osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
> >>
> >>   data:
> >>     volumes: 1/1 healthy
> >>     pools:   4 pools, 1313 pgs
> >>     objects: 260.72M objects, 466 TiB
> >>     usage:   704 TiB used, 424 TiB / 1.1 PiB avail
> >>     pgs:     1306 active+clean
> >>              4    active+clean+scrubbing+deep
> >>              3    active+clean+scrubbing
> >>
> >>   io:
> >>     client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr
> >>
> >> And the specifics are:
> >>
> >> # ceph health detail
> >> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
> >> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >>     mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
> >> 30 secs
> >> [WRN] MDS_TRIM: 1 MDSs behind on trimming
> >>     mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
> >> max_segments: 250, num_segments: 13884
> >>
> >> That "num_segments" number slowly keeps increasing.  I suspect I just
> >> need to tell the MDS servers to trim faster but after hours of
> >> googling around I just can't figure out the best way to do it. The
> >> best I could come up with was to decrease "mds_cache_trim_decay_rate"
> >> from 1.0 to .8 (to start), based on this page:
> >>
> >> https://www.suse.com/support/kb/doc/?id=000019740
> >>
> >> But it doesn't seem to help, maybe I should decrease it further? I am
> >> guessing this must be a common issue...?  I am running Reef on the MDS
> >> servers, but most clients are on Quincy.
> >>
> >> Thanks for any advice!
> >>
> >> cheers,
> >> erich
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx