> Can you tell us more than "it went degraded"? It's hard to know what you're seeing. Sorry, I only know of two states for the cephfs. Either it is not complaining or it is spitting out a " 1 filesystem is degraded" message. When I saw that, I meant that based on what I read in 'ceph -s'. Usually the symptom is I can't mount/access the files, but I've also had that symptom without the degraded message, but usually traced down to my fault. ceph is still too new for me to debug much further. I'd like to explain the testing scenario related to that in more details if your interested.. I'm off to a meeting soon, can email more on this later. On Thu, Sep 14, 2017 at 2:37 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Thu, 14 Sep 2017, Two Spirit wrote: >> >The thing that's damaged is a logical mds rank (0), not a physical MDS >> >daemon. What this is telling you is that there is some serious >> >corruption in your metadata pool that prevents that particular rank >> >from starting. >> >> Zhang helped identify the bug and put it in tracker. I didn't fully >> understand the problem but related to mds replay not happening and the >> write_pos being off. He fixed it after a full scrub was done, the >> degraded file system came back online. After a couple more hours of >> stress testing, the file system went back to degraded(earlier today). > > Can you tell us more than "it went degraded"? It's hard to know what > you're seeing. > > More generally, can you share what you did with the system that originally > triggered the unfound object? It ordinarily requires a sequence of > multiple not-quite-concurrent failures to induce that state, and we don't > see it much. I'm surprised you're hitting it right off the bat. > > Thanks! > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html