I tried both several times. I looks like it just had to read through the entire journal. I wish there was more progress notification about journal reading progress in debug less than 10 because 10 is way too noisy. That could give us an idea of how much longer there is left to go. It seems that the MDS got way too behind on segments ~14,000 from some naughty clients and caused the journal to explode and the MDS to eventually just not respond to the monitors. Thank you, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 12, 2020 at 12:48 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > On Thu, Mar 12, 2020 at 1:41 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> > wrote: > > > > This is the second time this happened in a couple of weeks. The MDS locks > > up and the stand-by can't take over so the Montiors black list them. I > try > > to unblack list them, but they still say this in the logs > > > > mds.0.1184394 waiting for osdmap 234947 (which blacklists prior instance) > > > > Looking at a pg dump, it looks like the epoch is passed that. > > > > $ ceph pg map 3.756 > > osdmap e234953 pg 3.756 (3.756) -> up [113,180,115] acting [113,180,115] > > > > Last time, it seemed to just recover after about an hour all by it's > self. > > Any way to speed this up? > > > > try restart the standby mds > > > Thank you, > > Robert LeBlanc > > ---------------- > > Robert LeBlanc > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx