Re: MDS Behind on Trimming...

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 10 Apr 2024 11:06:55 +0800

On 4/8/24 12:32, Erich Weiler wrote:
Ah, I see.  Yes, we are already running version 18.2.1 on the server side (we just installed this cluster a few weeks ago from scratch).  So I guess if the fix has already been backported to that version, then we still have a problem.

Dos that mean it could be the locker order bug (https://tracker.ceph.com/issues/62123) as Xiubo suggested?

I have raised one PR to fix the lock order issue, if possible please 
have a try to see could it resolve this issue.

Thanks

- Xiubo

Thanks again,
Erich

On Apr 7, 2024, at 9:00 PM, Alexander E. Patrakov <patrakov@xxxxxxxxx> wrote:

Hi Erich,

On Mon, Apr 8, 2024 at 11:51 AM Erich Weiler <weiler@xxxxxxxxxxxx> wrote:

Hi Xiubo,

Thanks for your logs, and it should be the same issue with
https://tracker.ceph.com/issues/62052, could you try to test with this
fix again ?
This sounds good - but I'm not clear on what I should do?  I see a patch
in that tracker page, is that what you are referring to?  If so, how
would I apply such a patch?  Or is there simply a binary update I can
apply somehow to the MDS server software?
The backport of this patch (https://github.com/ceph/ceph/pull/53241)
was merged on October 18, 2023, and Ceph 18.2.1 was released on
December 18, 2023. Therefore, if you are running Ceph 18.2.1 on the
server side, you already have the fix. If you are already running
version 18.2.1 or 18.2.2 (to which you should upgrade anyway), please
complain, as the purported fix is then ineffective.

Thanks for helping!

-erich

Please let me know if you still could see this bug then it should be the
locker order bug as https://tracker.ceph.com/issues/62123.

Thanks

- Xiubo

On 3/28/24 04:03, Erich Weiler wrote:
Hi All,

I've been battling this for a while and I'm not sure where to go from
here.  I have a Ceph health warning as such:

# ceph -s
  cluster:
    id:     58bde08a-d7ed-11ee-9098-506b4b4da440
    health: HEALTH_WARN
            1 MDSs report slow requests
            1 MDSs behind on trimming

  services:
    mon: 5 daemons, quorum
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
    mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
    mds: 1/1 daemons up, 2 standby
    osd: 46 osds: 46 up (since 9h), 46 in (since 2w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1313 pgs
    objects: 260.72M objects, 466 TiB
    usage:   704 TiB used, 424 TiB / 1.1 PiB avail
    pgs:     1306 active+clean
             4    active+clean+scrubbing+deep
             3    active+clean+scrubbing

  io:
    client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr

And the specifics are:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
max_segments: 250, num_segments: 13884

That "num_segments" number slowly keeps increasing.  I suspect I just
need to tell the MDS servers to trim faster but after hours of
googling around I just can't figure out the best way to do it. The
best I could come up with was to decrease "mds_cache_trim_decay_rate"
from 1.0 to .8 (to start), based on this page:

https://www.suse.com/support/kb/doc/?id=000019740

But it doesn't seem to help, maybe I should decrease it further? I am
guessing this must be a common issue...?  I am running Reef on the MDS
servers, but most clients are on Quincy.

Thanks for any advice!

cheers,
erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx