Hi Dan, thanks for our continued help, I really appreciate it. Just to clarify: > Your MDS is burning CPU (you see that with top) but it's unresponsive. Did you mean "is *not* burning CPU"? The MDS is idle - *no* CPU load, yet unresponsive. See below for a more detailed description of observations. For doing the investigation you ask for I need to run perf inside the container, but get the error "No permission to enable cycles:u event." Do you know how I can get perf to work inside the docker container? I use the official one from quay.io and run it with privileged=true. New observations: I had to bring up the OSDs in the host and can now confirm that the heartbeat failure is not related to swapping. This time the MDS needed to start swapping much earlier and it just continues to fill the cache. Slower this time, but it makes good progress. The MDS loads cache items until its done (it always stops at about the same number, which is slowly decreasing per restart; similarly, the reported stray count goes a bit down every restart) and then serves a few requests. Very shortly after that the request/s goes to 0 (dashboard), the heartbeat messages show up in the log and the MDS stops responding to daemon queries.