Re: MDS Behind on Trimming...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Erich,

Does the workload, by any chance, involve rsync? It is unfortunately
well-known for triggering such issues. A workaround is to export the
directory via NFS and run rsync against the NFS mount instead of
directly against CephFS.

On Fri, Mar 29, 2024 at 4:58 AM Erich Weiler <weiler@xxxxxxxxxxxx> wrote:
>
> MDS logs show:
>
> Mar 28 13:42:29 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> 3676.400077 secs
> Mar 28 13:42:30 pr-md-02.prism ceph-mds[1464328]:
> mds.slugfs.pr-md-02.sbblqq Updating MDS map to version 22775 from mon.3
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : 320 slow requests, 5 included below; oldest blocked for >
> 3681.400104 secs
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : slow request 3668.805732 seconds old, received at
> 2024-03-28T19:41:25.772531+0000: client_request(client.99375:574268
> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:25.770954+0000
> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> getattr
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : slow request 3667.883853 seconds old, received at
> 2024-03-28T19:41:26.694410+0000: client_request(client.99390:374844
> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:26.696172+0000
> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> getattr
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : slow request 3663.724571 seconds old, received at
> 2024-03-28T19:41:30.853692+0000: client_request(client.99390:375258
> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:30.852166+0000
> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> getattr
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : slow request 3681.399582 seconds old, received at
> 2024-03-28T19:41:13.178681+0000: client_request(client.99385:11712080
> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:13.178764+0000
> caller_uid=30150, caller_gid=600{600,608,999,}) currently failed to
> rdlock, waiting
> Mar 28 13:42:34 pr-md-02.prism ceph-mds[1464328]: log_channel(cluster)
> log [WRN] : slow request 3680.508972 seconds old, received at
> 2024-03-28T19:41:14.069291+0000: client_request(client.99385:11712556
> getattr AsXsFs #0x1000c097307 2024-03-28T19:41:14.070764+0000
> caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> getattr
>
> The client IDs map to several of our cluster nodes but the inode
> reference always refers to the same directory in these recent logs:
>
> /private/groups/shapirolab/brock/r2/cactus_coord
>
> That directory does not respond to an 'ls', but other directories
> directly above it do just fine.  Maybe it's a bad cache item on the MDS?
>
> # ceph health detail
> HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 1 MDSs
> report slow requests; 1 MDSs behind on trimming
> [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest
> client/flush tid
>      mds.slugfs.pr-md-02.sbblqq(mds.0): Client mustard failing to
> advance its oldest client/flush tid.  client_id: 101305
>      mds.slugfs.pr-md-01.xdtppo(mds.1): Client  failing to advance its
> oldest client/flush tid.  client_id: 101305
> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
>      mds.slugfs.pr-md-02.sbblqq(mds.0): 201 slow requests are blocked >
> 30 secs
> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>      mds.slugfs.pr-md-02.sbblqq(mds.0): Behind on trimming (4786/250)
> max_segments: 250, num_segments: 4786
>
> I think that this is somehow causing the "slow requests", on the nodes
> listed in the logs, as that directory in inaccessible.  And maybe the
> 'behind on trimming' part is also related, as it can't trim past that
> inode or something?
>
> If I restart the MDS daemon this will clear (I've done it before).  But
> it just comes back.  Often somewhere in the same directory
> /private/groups/shapirolab/brock/...[something].
>
> -erich
>
> On 3/28/24 10:11 AM, Erich Weiler wrote:
> > Here are some of the MDS logs:
> >
> > Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 511.703289 seconds old, received at
> > 2024-03-27T18:49:53.623192+0000: client_request(client.99375:459393
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:49:53.620806+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 690.189459 seconds old, received at
> > 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 686.308604 seconds old, received at
> > 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:25 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 686.156943 seconds old, received at
> > 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:26 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16631 from mon.0
> > Mar 27 11:58:30 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 699.385743 secs
> > Mar 27 11:58:34 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16632 from mon.0
> > Mar 27 11:58:35 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 704.385896 secs
> > Mar 27 11:58:38 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16633 from mon.0
> > Mar 27 11:58:40 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 709.385979 secs
> > Mar 27 11:58:42 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16634 from mon.0
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 78 slow requests, 5 included below; oldest blocked for >
> > 714.386040 secs
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 710.189838 seconds old, received at
> > 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 706.308983 seconds old, received at
> > 2024-03-27T18:46:59.017876+0000: client_request(client.99445:4190508
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.018864+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 706.157322 seconds old, received at
> > 2024-03-27T18:46:59.169537+0000: client_request(client.99400:591887
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.170644+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 706.086751 seconds old, received at
> > 2024-03-27T18:46:59.240108+0000: client_request(client.99400:591894
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:59.242644+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 705.196030 seconds old, received at
> > 2024-03-27T18:47:00.130829+0000: client_request(client.99400:591985
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:47:00.130641+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> > Mar 27 11:58:45 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16635 from mon.0
> > Mar 27 11:58:50 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 719.386116 secs
> > Mar 27 11:58:53 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16636 from mon.0
> > Mar 27 11:58:55 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 724.386184 secs
> > Mar 27 11:58:57 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16637 from mon.0
> > Mar 27 11:59:00 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 16 slow requests, 0 included below; oldest blocked for >
> > 729.386333 secs
> > Mar 27 11:59:02 pr-md-01.prism ceph-mds[1296468]:
> > mds.slugfs.pr-md-01.xdtppo Updating MDS map to version 16638 from mon.0
> > Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : 53 slow requests, 5 included below; oldest blocked for >
> > 734.386400 secs
> > Mar 27 11:59:05 pr-md-01.prism ceph-mds[1296468]: log_channel(cluster)
> > log [WRN] : slow request 730.190197 seconds old, received at
> > 2024-03-27T18:46:55.137022+0000: client_request(client.99445:4189994
> > getattr AsXsFs #0x100081b9ceb 2024-03-27T18:46:55.134857+0000
> > caller_uid=30150, caller_gid=600{600,608,999,}) currently joining batch
> > getattr
> >
> > Can we tell which client the slow requests are coming from?  It says
> > stuff like "client.99445:4189994" but I don't know how to map that to a
> > client...
> >
> > Thanks for the response!
> >
> > -erich
> >
> > On 3/27/24 21:28, Xiubo Li wrote:
> >>
> >> On 3/28/24 04:03, Erich Weiler wrote:
> >>> Hi All,
> >>>
> >>> I've been battling this for a while and I'm not sure where to go from
> >>> here.  I have a Ceph health warning as such:
> >>>
> >>> # ceph -s
> >>>   cluster:
> >>>     id:     58bde08a-d7ed-11ee-9098-506b4b4da440
> >>>     health: HEALTH_WARN
> >>>             1 MDSs report slow requests
> >>
> >> There had slow requests. I just suspect the behind on trimming was
> >> caused by this.
> >>
> >> Could you share the logs about the slow requests ? What are they ?
> >>
> >> Thanks
> >>
> >>
> >>> 1 MDSs behind on trimming
> >>>
> >>>   services:
> >>>     mon: 5 daemons, quorum
> >>> pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
> >>>     mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
> >>>     mds: 1/1 daemons up, 2 standby
> >>>     osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
> >>>
> >>>   data:
> >>>     volumes: 1/1 healthy
> >>>     pools:   4 pools, 1313 pgs
> >>>     objects: 260.72M objects, 466 TiB
> >>>     usage:   704 TiB used, 424 TiB / 1.1 PiB avail
> >>>     pgs:     1306 active+clean
> >>>              4    active+clean+scrubbing+deep
> >>>              3    active+clean+scrubbing
> >>>
> >>>   io:
> >>>     client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr
> >>>
> >>> And the specifics are:
> >>>
> >>> # ceph health detail
> >>> HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
> >>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >>>     mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
> >>> 30 secs
> >>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
> >>>     mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
> >>> max_segments: 250, num_segments: 13884
> >>>
> >>> That "num_segments" number slowly keeps increasing.  I suspect I just
> >>> need to tell the MDS servers to trim faster but after hours of
> >>> googling around I just can't figure out the best way to do it. The
> >>> best I could come up with was to decrease "mds_cache_trim_decay_rate"
> >>> from 1.0 to .8 (to start), based on this page:
> >>>
> >>> https://www.suse.com/support/kb/doc/?id=000019740
> >>>
> >>> But it doesn't seem to help, maybe I should decrease it further? I am
> >>> guessing this must be a common issue...?  I am running Reef on the
> >>> MDS servers, but most clients are on Quincy.
> >>>
> >>> Thanks for any advice!
> >>>
> >>> cheers,
> >>> erich
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux