Re: MDS stuck in rejoin

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 26 Jul 2023 08:49:53 +0800

On 7/24/23 23:02, Frank Schilder wrote:
Hi Xiubo,

I seem to have gotten your e-mail twice.

Its a very old kclient. It was in that state when I came to work in the morning and I looked at it in the afternoon. Was hoping the problem would clear by itself.

Okay.

One correction for my last comments:

"This means in the client side the oldest client has been stuck too 
long," --> "This means in the client side the oldest REQUEST has been 
stuck too long,"

It was probably a compute job that crashed it, its a compute node in our HPC cluster. I didn't look at dmesg, but I can take a look at he MDS log when I'm back at work. I guess you are interested in the time before the warning started appearing.

Yeah. But I am more interested in the kclient side logs. Just want to 
know why that oldest request got stuck so long.

Maybe it's buggy that the old kernel couldn't clear the request or 
advance the tid in corner cases. There is an old tracker 
https://tracker.ceph.com/issues/22885, seems the same issue but not 
resolved yet.

The MDS was probably stuck 5-10 minutes. This is very loooong on our cluster. An MDS failover usually takes 30s only. Also I couldn't see any progress in the MDS log, that's why I decided to fail the rank a second time.

Correct. Mostly likely a bug in client side.

Thanks

- Xiubo

During the time it was stuck in rejoin it didn't log any other messages than he MDS map update messages. I don't think it was doing anything at all.

Some background: Before I started recovery I found an old post stating that the MDS_CLIENT_OLDEST_TID warning is quite serious, because the affected MDS will experience growing cache allocation and eventually fail in a practically irrecoverable state. I did not observe unusual RAM consumption and there were no MDS large cache messages either. Seems like our situation was of a more harmless nature. Still, the fail did not go entirely smooth.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Xiubo Li <xiubli@xxxxxxxxxx>
Sent: Friday, July 21, 2023 1:32 PM
To: ceph-users@xxxxxxx
Subject:  Re: MDS stuck in rejoin

On 7/20/23 22:09, Frank Schilder wrote:
Hi all,

we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid". I looked at the client and there was nothing going on, so I rebooted it. After the client was back, the message was still there. To clean this up I failed the MDS. Unfortunately, the MDS that took over is remained stuck in rejoin without doing anything. All that happened in the log was:
BTW, are you using the kclient or user space client ? How long was the
MDS stuck in rejoin state ?

This means in the client side the oldest client has been stuck too long,
maybe in heavy load case there were to many requests generated in a
short time and the oldest request was stuck too long in MDS.

[root@ceph-10 ceph]# tail -f ceph-mds.ceph-10.log
2023-07-20T15:54:29.147+0200 7fedb9c9f700  1 mds.2.896604 rejoin_start
2023-07-20T15:54:29.161+0200 7fedb9c9f700  1 mds.2.896604 rejoin_joint_start
2023-07-20T15:55:28.005+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to version 896614 from mon.4
2023-07-20T15:56:00.278+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to version 896615 from mon.4
[...]
2023-07-20T16:02:54.935+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to version 896653 from mon.4
2023-07-20T16:03:07.276+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to version 896654 from mon.4
Did you see any slow request log in the mds log files ? And any other
suspect logs from the dmesg if it's kclient ?

After some time I decided to give another fail a try and, this time, the replacement daemon went to active state really fast.

If I have a message like the above, what is the clean way of getting the client clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable))?
I think your steps are correct.

Thanks

- Xiubo

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx