Hi Frank,
not sure if this already has been mentioned, but this one has 60
seconds timeout:
mds_beacon_mon_down_grace
ceph config help mds_beacon_mon_down_grace
mds_beacon_mon_down_grace - tolerance in seconds for missed MDS
beacons to monitors
(secs, advanced)
Default: 60
Can update at runtime: true
Services: [mon]
Maybe bumping that up could help here?
Zitat von Frank Schilder <frans@xxxxxx>:
And another small piece of information:
Needed to do another restart. This time I managed to capture the
approximate length of the period for which the MDS is up and
responsive after loading the cache (it reports stats). Its pretty
much exactly 60 seconds. This smells like a timeout. Is there any
MDS/ceph-fs related timeout with a 60s default somewhere?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Saturday, January 11, 2025 12:46 PM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@xxxxxxx
Subject: Re: Help needed, ceph fs down due to large stray dir
Hi all,
my hopes are down again. The MDS might look busy but I'm not sure
its doing anything interesting. I now see a lot of these in the log
(stripped the heartbeat messages):
2025-01-11T12:35:50.712+0100 7ff888375700 -1 monclient:
_check_auth_rotating possible clock skew, rotating keys expired way
too early (before 2025-01-11T11:35:50.713867+0100)
2025-01-11T12:35:51.712+0100 7ff888375700 -1 monclient:
_check_auth_rotating possible clock skew, rotating keys expired way
too early (before 2025-01-11T11:35:51.714027+0100)
2025-01-11T12:35:52.712+0100 7ff888375700 -1 monclient:
_check_auth_rotating possible clock skew, rotating keys expired way
too early (before 2025-01-11T11:35:52.714335+0100)
2025-01-11T12:35:53.084+0100 7ff88cb7e700 0 auth: could not find
secret_id=51092
2025-01-11T12:35:53.084+0100 7ff88cb7e700 0 cephx:
verify_authorizer could not get service secret for service mds
secret_id=51092
2025-01-11T12:35:53.353+0100 7ff88cb7e700 0 auth: could not find
secret_id=51092
2025-01-11T12:35:53.353+0100 7ff88cb7e700 0 cephx:
verify_authorizer could not get service secret for service mds
secret_id=51092
2025-01-11T12:35:53.536+0100 7ff88cb7e700 0 auth: could not find
secret_id=51092
2025-01-11T12:35:53.536+0100 7ff88cb7e700 0 cephx:
verify_authorizer could not get service secret for service mds
secret_id=51092
2025-01-11T12:35:53.573+0100 7ff88cb7e700 0 auth: could not find
secret_id=51092
2025-01-11T12:35:53.573+0100 7ff88cb7e700 0 cephx:
verify_authorizer could not get service secret for service mds
secret_id=51092
Looks like the auth key for the MDS expired and cannot be renewed.
Is there a grace period for that as well?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Saturday, January 11, 2025 11:41 AM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@xxxxxxx
Subject: Re: Help needed, ceph fs down due to large stray dir
Hi all,
new update: after sleeping after the final MDS restart the MDS is
doing something! It is still unresponsive, but it does show CPU load
of between 150-200% and I really really hope that this is the
trimming of stray items.
I will try to find out if I get perf to work inside the container.
For now, to facilitate trouble shooting, I will add a swap disk to
every MDS host just to be on the safe side if stuff fails over.
Just to get my hopes back: can someone (from the dev team) let me
know if it is expected that an MDS is unresponsive during stray
evaluation?
Thanks and best regards!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx