Re: FAILED assert(p.same_interval_since) and unusable cluster

Jon Light <jon@xxxxxxxxxxxx> · Wed, 1 Nov 2017 11:39:07 -0700

I'm currently running 12.2.0. How should I go about applying the patch? Should I upgrade to 12.2.1, apply the changes, and then recompile?
I really appreciate the patch.
Thanks

On Wed, Nov 1, 2017 at 11:10 AM, David Zafman <dzafman@xxxxxxxxxx> wrote:

Jon,

    If you are able please test my tentative fix for this issue which is in https://github.com/ceph/ceph/pull/18673

Thanks

David

On 10/30/17 1:13 AM, Jon Light wrote:

Hello,

I have three OSDs that are crashing on start with a FAILED

assert(p.same_interval_since) error. I ran across a thread from a few days

ago about the same issue and a ticket was created here:

http://tracker.ceph.com/issues/21833.

A very overloaded node in my cluster OOM'd many times which eventually led

to the problematic PGs and then the failed assert.

I currently have 49 pgs inactive, 33 pgs down, 15 pgs incomplete as well as

0.028% of objects unfound. Presumably due to this, I can't add any data to

the FS or read some data. Just about any IO ends up in a good bit of stuck

requests.

Hopefully a fix can come from the issue, but can anyone give me some

suggestions or guidance to get the cluster in a working state in the

meantime?

Thanks

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com