Re: MDS hung in purge_stale_snap_data after populating cache

Frank Schilder <frans@xxxxxx> · Thu, 23 Jan 2025 16:44:54 +0000

Hi all,

with the help of Croit we got back on our feet. I will post a detailed post-mortem later this month including information about how to check if a cluster is in the same situation.

Long story short, we hit a deadlock due to competition between MDS cache trimming and purging stale strays. "Disabling" cache trimming by setting a ridiculously high mds_memory_limit on the bad rank did the trick. Purging 100Mio strays is actually no problem and doesn't require much if any RAM by itself (I mean here the purge that happens on MDS restart, I don't know if the forward-scrub purge behaves the same). Our cluster managed to purge about 10K items/s and after a few hours everything was cleaned out. While purging it was serving client IO, so the FS is up right away.

A big thank you to everyone who helped with this case.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, January 20, 2025 6:49 PM
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: MDS hung in purge_stale_snap_data after populating cache

A colleague of mine suggested to create a coredump when the MDS has
become stale and then inspect it with gdb. But if you think it’s more
promising to increase the buffer, or maybe it’s quicker to test, then
do that first.

Zitat von Frank Schilder <frans@xxxxxx>:

>> which is 3758096384. I'm not even sure what the unit is, probably bytes?
>
> Sorry, it is bytes. Our items are about 100b on average, that's how
> we observe approximately 37462448 executions of
> purge_stale_snap_data until the queue is filled up.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <frans@xxxxxx>
> Sent: Monday, January 20, 2025 1:51 PM
> To: Eugen Block
> Cc: ceph-users@xxxxxxx
> Subject:  Re: MDS hung in purge_stale_snap_data after
> populating cache
>
>> which is 3758096384. I'm not even sure what the unit is, probably bytes?
>
> As far as I understand the unit is "list items". They can have
> variable length. On our system about 400G are allocated while
> filling up the bufferlist.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx