Re: Stuck in replay?

Sake Ceph <ceph@xxxxxxxxxxx> · Mon, 22 Apr 2024 20:37:46 +0200 (CEST)

Just a question: is it possible to block or disable all clients? Just to prevent load on the system. 

Kind regards, 
Sake 
> Op 22-04-2024 20:33 CEST schreef Erich Weiler <weiler@xxxxxxxxxxxx>:
> 
>  
> I also see this from 'ceph health detail':
> 
> # ceph health detail
> HEALTH_WARN 1 filesystem is degraded; 1 MDSs report oversized cache; 1 
> MDSs behind on trimming
> [WRN] FS_DEGRADED: 1 filesystem is degraded
>      fs slugfs is degraded
> [WRN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
>      mds.slugfs.pr-md-01.xdtppo(mds.0): MDS cache is too large 
> (19GB/8GB); 0 inodes in use by clients, 0 stray files
> [WRN] MDS_TRIM: 1 MDSs behind on trimming
>      mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (127084/250) 
> max_segments: 250, num_segments: 127084
> 
> MDS cache too large?  The mds process is taking up 22GB right now and 
> starting to swap my server, so maybe it somehow is too large....
> 
> On 4/22/24 11:17 AM, Erich Weiler wrote:
> > Hi All,
> > 
> > We have a somewhat serious situation where we have a cephfs filesystem 
> > (18.2.1), and 2 active MDSs (one standby).  ThI tried to restart one of 
> > the active daemons to unstick a bunch of blocked requests, and the 
> > standby went into 'replay' for a very long time, then RAM on that MDS 
> > server filled up, and it just stayed there for a while then eventually 
> > appeared to give up and switched to the standby, but the cycle started 
> > again.  So I restarted that MDS, and now I'm in a situation where I see 
> > this:
> > 
> > # ceph fs status
> > slugfs - 29 clients
> > ======
> > RANK   STATE            MDS            ACTIVITY   DNS    INOS   DIRS   CAPS
> >   0     replay  slugfs.pr-md-01.xdtppo            3958k  57.1k  12.2k     0
> >   1    resolve  slugfs.pr-md-02.sbblqq               0      3      1      0
> >         POOL           TYPE     USED  AVAIL
> >   cephfs_metadata    metadata   997G  2948G
> > cephfs_md_and_data    data       0   87.6T
> >     cephfs_data        data     773T   175T
> >       STANDBY MDS
> > slugfs.pr-md-03.mclckv
> > MDS version: ceph version 18.2.1 
> > (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
> > 
> > It just stays there indefinitely.  All my clients are hung.  I tried 
> > restarting all MDS daemons and they just went back to this state after 
> > coming back up.
> > 
> > Is there any way I can somehow escape this state of indefinite 
> > replay/resolve?
> > 
> > Thanks so much!  I'm kinda nervous since none of my clients have 
> > filesystem access at the moment...
> > 
> > cheers,
> > erich
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx