Re: MDS Behind on Trimming...

Xiubo Li <xiubli@xxxxxxxxxx> · Fri, 29 Mar 2024 08:44:51 +0800

On 3/29/24 04:57, Erich Weiler wrote:
MDS logs show:

Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : 16 slow requests, 0 included below; oldest blocked for > 
3676.400077 secs
Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]: 
mds.slugfs.pr-md-02.sbblqq Updating MDS map to version 22775 from mon.3
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : 320 slow requests, 5 included below; oldest blocked for > 
3681.400104 secs
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : slow request 3668.805732 seconds old, received at 
2024-03-28T19:41:25.772531+0000: client_request(client.99375:574268 
getattr AsXsFs #0x1000c097307 2024-03-28T19:41:25.770954+0000 
caller_uid=30150, caller_gid=600{600,608,999,}) currently joining 
batch getattr
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : slow request 3667.883853 seconds old, received at 
2024-03-28T19:41:26.694410+0000: client_request(client.99390:374844 
getattr AsXsFs #0x1000c097307 2024-03-28T19:41:26.696172+0000 
caller_uid=30150, caller_gid=600{600,608,999,}) currently joining 
batch getattr
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : slow request 3663.724571 seconds old, received at 
2024-03-28T19:41:30.853692+0000: client_request(client.99390:375258 
getattr AsXsFs #0x1000c097307 2024-03-28T19:41:30.852166+0000 
caller_uid=30150, caller_gid=600{600,608,999,}) currently joining 
batch getattr
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : slow request 3681.399582 seconds old, received at 
2024-03-28T19:41:13.178681+0000: client_request(client.99385:11712080 
getattr AsXsFs #0x1000c097307 2024-03-28T19:41:13.178764+0000 
caller_uid=30150, caller_gid=600{600,608,999,}) currently failed to 
rdlock, waiting
Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster) 
log [WRN] : slow request 3680.508972 seconds old, received at 
2024-03-28T19:41:14.069291+0000: client_request(client.99385:11712556 
getattr AsXsFs #0x1000c097307 2024-03-28T19:41:14.070764+0000 
caller_uid=30150, caller_gid=600{600,608,999,}) currently joining 
batch getattr

The client IDs map to several of our cluster nodes but the inode 
reference always refers to the same directory in these recent logs:

/private/groups/shapirolab/brock/r2/cactus_coord

That directory does not respond to an 'ls', but other directories 
directly above it do just fine.  Maybe it's a bad cache item on the MDS?

The previous getattr requests have already been stuck, so the following 
`ls` command will stuck, this is expected.

The oldest one was stuck trying to get the 'rdlock'.

Could you dump the inflight ops from the MDS Cache ?

$ ceph daemon mds.${XXXX} dump_ops_in_flight

$ ceph daemon mds.${XXXX} dump_blocked_ops

$ ceph daemon mds.${XXXX} dump_blocked_ops_count

$ ceph daemon mds.${XXXX} dump_historic_ops

$ ceph daemon mds.${XXXX} dump_historic_ops_by_duration

# ceph health detail
HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 1 
MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest 
client/flush tid
    mds.slugfs.pr-md-02.sbblqq(mds.0): Client mustard failing to 
advance its oldest client/flush tid.  client_id: 101305
    mds.slugfs.pr-md-01.xdtppo(mds.1): Client  failing to advance its 
oldest client/flush tid.  client_id: 101305

This is possible related to https://tracker.ceph.com/issues/63364, which 
will have a fix in both cephfs and kclient.

If it's  really caused by this and it could be resolved by issuing any 
sync client operation from the corresponding client.

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.slugfs.pr-md-02.sbblqq(mds.0): 201 slow requests are blocked > 
30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.slugfs.pr-md-02.sbblqq(mds.0): Behind on trimming (4786/250) 
max_segments: 250, num_segments: 4786

I think that this is somehow causing the "slow requests", on the nodes 
listed in the logs, as that directory in inaccessible.  And maybe the 
'behind on trimming' part is also related, as it can't trim past that 
inode or something?

We need to know why the first 'getattr' was stuck waiting for the 
'rdlock' from the MDCache.

If I restart the MDS daemon this will clear (I've done it before).  
But it just comes back.  Often somewhere in the same directory 
/private/groups/shapirolab/brock/...[something].

Please try to remount the corresponding kclient instead.

Thanks

- Xiubo

-erich

On 3/28/24 10:11 AM, Erich Weiler wrote:
Here are some of the MDS logs:

Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 511.703289 seconds old, 
received at 2024-03-27T18:49:53.623192+0000: 
client_request(client.99375:459393 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:49:53.620806+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 690.189459 seconds old, 
received at 2024-03-27T18:46:55.137022+0000: 
client_request(client.99445:4189994 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:55.134857+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 686.308604 seconds old, 
received at 2024-03-27T18:46:59.017876+0000: 
client_request(client.99445:4190508 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:59.018864+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 686.156943 seconds old, 
received at 2024-03-27T18:46:59.169537+0000: 
client_request(client.99400:591887 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:59.170644+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:26 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16631 from mon.0
Mar 27 11:58:30 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 699.385743 secs
Mar 27 11:58:34 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16632 from mon.0
Mar 27 11:58:35 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 704.385896 secs
Mar 27 11:58:38 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16633 from mon.0
Mar 27 11:58:40 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 709.385979 secs
Mar 27 11:58:42 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16634 from mon.0
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 78 slow requests, 5 included below; 
oldest blocked for > 714.386040 secs
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 710.189838 seconds old, 
received at 2024-03-27T18:46:55.137022+0000: 
client_request(client.99445:4189994 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:55.134857+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 706.308983 seconds old, 
received at 2024-03-27T18:46:59.017876+0000: 
client_request(client.99445:4190508 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:59.018864+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 706.157322 seconds old, 
received at 2024-03-27T18:46:59.169537+0000: 
client_request(client.99400:591887 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:59.170644+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 706.086751 seconds old, 
received at 2024-03-27T18:46:59.240108+0000: 
client_request(client.99400:591894 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:59.242644+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 705.196030 seconds old, 
received at 2024-03-27T18:47:00.130829+0000: 
client_request(client.99400:591985 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:47:00.130641+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr
Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16635 from mon.0
Mar 27 11:58:50 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 719.386116 secs
Mar 27 11:58:53 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16636 from mon.0
Mar 27 11:58:55 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 724.386184 secs
Mar 27 11:58:57 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16637 from mon.0
Mar 27 11:59:00 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 16 slow requests, 0 included below; 
oldest blocked for > 729.386333 secs
Mar 27 11:59:02 pr-md-01.prism ceph-mds[1296468]: 
mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16638 from mon.0
Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : 53 slow requests, 5 included below; 
oldest blocked for > 734.386400 secs
Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: 
log_channel(cluster) log [WRN] : slow request 730.190197 seconds old, 
received at 2024-03-27T18:46:55.137022+0000: 
client_request(client.99445:4189994 getattr AsXsFs #0x100081b9ceb 
2024-03-27T18:46:55.134857+0000 caller_uid=30150, 
caller_gid=600{600,608,999,}) currently joining batch getattr

Can we tell which client the slow requests are coming from?  It says 
stuff like "client.99445:4189994" but I don't know how to map that to 
a client...

Thanks for the response!

-erich

On 3/27/24 21:28, Xiubo Li wrote:

On 3/28/24 04:03, Erich Weiler wrote:
Hi All,

I've been battling this for a while and I'm not sure where to go 
from here.  I have a Ceph health warning as such:

# ceph -s
  cluster:
    id:     58bde08a-d7ed-11ee-9098-506b4b4da440
    health: HEALTH_WARN
            1 MDSs report slow requests

There had slow requests. I just suspect the behind on trimming was 
caused by this.

Could you share the logs about the slow requests ? What are they ?

Thanks

1 MDSs behind on trimming

  services:
    mon: 5 daemons, quorum 
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
    mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
    mds: 1/1 daemons up, 2 standby
    osd: 46 osds: 46 up (since 9h), 46 in (since 2w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1313 pgs
    objects: 260.72M objects, 466 TiB
    usage:   704 TiB used, 424 TiB / 1.1 PiB avail
    pgs:     1306 active+clean
             4    active+clean+scrubbing+deep
             3    active+clean+scrubbing

  io:
    client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr

And the specifics are:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked 
> 30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming 
(13884/250) max_segments: 250, num_segments: 13884

That "num_segments" number slowly keeps increasing.  I suspect I 
just need to tell the MDS servers to trim faster but after hours of 
googling around I just can't figure out the best way to do it. The 
best I could come up with was to decrease 
"mds_cache_trim_decay_rate" from 1.0 to .8 (to start), based on 
this page:

https://www.suse.com/support/kb/doc/?id=000019740

But it doesn't seem to help, maybe I should decrease it further? I 
am guessing this must be a common issue...?  I am running Reef on 
the MDS servers, but most clients are on Quincy.

Thanks for any advice!

cheers,
erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx