Re: Help needed, ceph fs down due to large stray dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

my hopes are down again. The MDS might look busy but I'm not sure its doing anything interesting. I now see a lot of these in the log (stripped the heartbeat messages):

2025-01-11T12:35:50.712+0100 7ff888375700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2025-01-11T11:35:50.713867+0100)
2025-01-11T12:35:51.712+0100 7ff888375700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2025-01-11T11:35:51.714027+0100)
2025-01-11T12:35:52.712+0100 7ff888375700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2025-01-11T11:35:52.714335+0100)
2025-01-11T12:35:53.084+0100 7ff88cb7e700  0 auth: could not find secret_id=51092
2025-01-11T12:35:53.084+0100 7ff88cb7e700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=51092
2025-01-11T12:35:53.353+0100 7ff88cb7e700  0 auth: could not find secret_id=51092
2025-01-11T12:35:53.353+0100 7ff88cb7e700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=51092
2025-01-11T12:35:53.536+0100 7ff88cb7e700  0 auth: could not find secret_id=51092
2025-01-11T12:35:53.536+0100 7ff88cb7e700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=51092
2025-01-11T12:35:53.573+0100 7ff88cb7e700  0 auth: could not find secret_id=51092
2025-01-11T12:35:53.573+0100 7ff88cb7e700  0 cephx: verify_authorizer could not get service secret for service mds secret_id=51092

Looks like the auth key for the MDS expired and cannot be renewed. Is there a grace period for that as well?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Saturday, January 11, 2025 11:41 AM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@xxxxxxx
Subject:  Re: Help needed, ceph fs down due to large stray dir

Hi all,

new update: after sleeping after the final MDS restart the MDS is doing something! It is still unresponsive, but it does show CPU load of between 150-200% and I really really hope that this is the trimming of stray items.

I will try to find out if I get perf to work inside the container. For now, to facilitate trouble shooting, I will add a swap disk to every MDS host just to be on the safe side if stuff fails over.

Just to get my hopes back: can someone (from the dev team) let me know if it is expected that an MDS is unresponsive during stray evaluation?

Thanks and best regards!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux