On Thu, Sep 23, 2010 at 1:14 AM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote: > On Wed, Sep 22, 2010 at 8:29 AM, Henry C Chang > <henry_c_chang@xxxxxxxxxxxxxxxxxxx> wrote: >> 1. http://github.com/tcloud/ceph/commit/3c91d0c0e20425f94eaa374f39b2a1f398270255 >> >> This fixed a monitor bug we hit several times. When some osds report >> that other osds are down, >> there is a chance that the leading monitor would enter an infinite loop (you can >> see the CPU usage of cmon is 100% from "top"). > You diagnosed the problem correctly, but it looks like your patch can > make the monitor discard the wrong report! Fixed this by sticking an > "&& i != fail_notes.end()" into the while loop, in > commit:2c5a3d99aa3be5ce114072e84f73a0a6426e63fd. With this commit, however, if i == fail_notes.end() and leaves the while loop, it will cause segfault/abort in the following if-else when fail_notes.erase(i) is called. (I wonder if my patch is right.) Henry -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html