Success! I remembered I had a server I'd taken out of the cluster to investigate some issues, that had some good quality 800GB Intel DC SSDs, dedicated an entire drive to swap, tuned up min_free_kbytes, added an MDS to that server and let it run. Took 3 - 4 hours but eventually came back online. It used the 128GB of RAM and about 250GB of the swap. Dan, thanks so much for steering me down this path, I would have more than likely started hacking away at the journal otherwise! Frank, thanks for pointing me towards that other thread, I used your min_free_kbytes tip I now need to consider updating - I wonder if the risk averse CephFS operator would go for the latest Nautilus or latest Octopus, it used to be that the newer CephFS code meant the most stable but don't know if that's still the case. Thanks, again David On Thu, Oct 22, 2020 at 7:06 PM Frank Schilder <frans@xxxxxx> wrote: > > The post was titled "mds behind on trimming - replay until memory exhausted". > > > Load up with swap and try the up:replay route. > > Set the beacon to 100000 until it finishes. > > Good point! The MDS will not send beacons for a long time. Same was necessary in the other case. > > Good luck! > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx