Re: MDS stuck in up:stopping state

Mark Schouten <mark@xxxxxxxx> · Thu, 27 May 2021 13:10:25 +0200

On Thu, May 27, 2021 at 12:38:07PM +0200, Mark Schouten wrote:
> On Thu, May 27, 2021 at 06:25:44AM +0000, Martin Rasmus Lundquist Hansen wrote:
> > After scaling the number of MDS daemons down, we now have a daemon stuck in the
> > "up:stopping" state. The documentation says it can take several minutes to stop the
> > daemon, but it has been stuck in this state for almost a full day. According to
> > the "ceph fs status" output attached below, it still holds information about 2
> > inodes, which we assume is the reason why it cannot stop completely.
> > 
> > Does anyone know what we can do to finally stop it?
> 
> I have no clients, and it still does not want to stop rank1. Funny
> thing is, while trying to fix this by restarting mdses, I sometimes see
> a list of clients popping up in the dashboard, even though no clients
> are connected..

Configuring debuglogging shows me the following:
https://p.6core.net/p/rlMaunS8IM1AY5E58uUB6oy4

I have quite a lot of hardlinks on this filesystem, which I've seen
issue with 'No space left on device'. I have mds_bal_fragment_size_max
set to 200000 to mitigate that.

The message 'waiting for strays to migrate' makes me feel like I should
push the MDS to migrate them somehow .. But how?

-- 
Mark Schouten     | Tuxis B.V.
KvK: 74698818     | http://www.tuxis.nl/
T: +31 318 200208 | info@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx