On Wed, 29 Sep 2010, Gregory Farnum wrote: > On Wed, Sep 29, 2010 at 9:29 AM, Henry C Chang > <henry_c_chang@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Sep 23, 2010 at 1:14 AM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote: > >> On Wed, Sep 22, 2010 at 8:29 AM, Henry C Chang > >> <henry_c_chang@xxxxxxxxxxxxxxxxxxx> wrote: > >>> 1. http://github.com/tcloud/ceph/commit/3c91d0c0e20425f94eaa374f39b2a1f398270255 > >>> > >>> This fixed a monitor bug we hit several times. When some osds report > >>> that other osds are down, > >>> there is a chance that the leading monitor would enter an infinite loop (you can > >>> see the CPU usage of cmon is 100% from "top"). > >> You diagnosed the problem correctly, but it looks like your patch can > >> make the monitor discard the wrong report! Fixed this by sticking an > >> "&& i != fail_notes.end()" into the while loop, in > >> commit:2c5a3d99aa3be5ce114072e84f73a0a6426e63fd. > > > > With this commit, however, if i == fail_notes.end() and leaves the while loop, > > it will cause segfault/abort in the following if-else when > > fail_notes.erase(i) is called. > > (I wonder if my patch is right.) > Whoops, I missed that in the branching. > The basic problem here is that if you reach end() then attempting to > deref either of the values returns 0 -- which in this case can be a > valid value. So, also, check for end() in the if clause. > commit:ba39fd782375dc1bc909519fa3f5307a9696032b I think the order in the if () should be reversed... it's not legal to dereference the iterator if it's == end(). Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html