Just a question: is it possible to block or disable all clients? Just to prevent load on the system. Kind regards, Sake > Op 22-04-2024 20:33 CEST schreef Erich Weiler <weiler@xxxxxxxxxxxx>: > > > I also see this from 'ceph health detail': > > # ceph health detail > HEALTH_WARN 1 filesystem is degraded; 1 MDSs report oversized cache; 1 > MDSs behind on trimming > [WRN] FS_DEGRADED: 1 filesystem is degraded > fs slugfs is degraded > [WRN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache > mds.slugfs.pr-md-01.xdtppo(mds.0): MDS cache is too large > (19GB/8GB); 0 inodes in use by clients, 0 stray files > [WRN] MDS_TRIM: 1 MDSs behind on trimming > mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (127084/250) > max_segments: 250, num_segments: 127084 > > MDS cache too large? The mds process is taking up 22GB right now and > starting to swap my server, so maybe it somehow is too large.... > > On 4/22/24 11:17 AM, Erich Weiler wrote: > > Hi All, > > > > We have a somewhat serious situation where we have a cephfs filesystem > > (18.2.1), and 2 active MDSs (one standby). ThI tried to restart one of > > the active daemons to unstick a bunch of blocked requests, and the > > standby went into 'replay' for a very long time, then RAM on that MDS > > server filled up, and it just stayed there for a while then eventually > > appeared to give up and switched to the standby, but the cycle started > > again. So I restarted that MDS, and now I'm in a situation where I see > > this: > > > > # ceph fs status > > slugfs - 29 clients > > ====== > > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > > 0 replay slugfs.pr-md-01.xdtppo 3958k 57.1k 12.2k 0 > > 1 resolve slugfs.pr-md-02.sbblqq 0 3 1 0 > > POOL TYPE USED AVAIL > > cephfs_metadata metadata 997G 2948G > > cephfs_md_and_data data 0 87.6T > > cephfs_data data 773T 175T > > STANDBY MDS > > slugfs.pr-md-03.mclckv > > MDS version: ceph version 18.2.1 > > (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable) > > > > It just stays there indefinitely. All my clients are hung. I tried > > restarting all MDS daemons and they just went back to this state after > > coming back up. > > > > Is there any way I can somehow escape this state of indefinite > > replay/resolve? > > > > Thanks so much! I'm kinda nervous since none of my clients have > > filesystem access at the moment... > > > > cheers, > > erich > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx