Re: Several patches for CEPH

Henry C Chang <henry_c_chang@xxxxxxxxxxxxxxxxxxx> · Thu, 30 Sep 2010 00:29:22 +0800

On Thu, Sep 23, 2010 at 1:14 AM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote:
> On Wed, Sep 22, 2010 at 8:29 AM, Henry C Chang
> <henry_c_chang@xxxxxxxxxxxxxxxxxxx> wrote:
>> 1. http://github.com/tcloud/ceph/commit/3c91d0c0e20425f94eaa374f39b2a1f398270255
>>
>> This fixed a monitor bug we hit several times. When some osds report
>> that other osds are down,
>> there is a chance that the leading monitor would enter an infinite loop (you can
>> see the CPU usage of cmon is 100% from "top").
> You diagnosed the problem correctly, but it looks like your patch can
> make the monitor discard the wrong report! Fixed this by sticking an
> "&& i != fail_notes.end()" into the while loop, in
> commit:2c5a3d99aa3be5ce114072e84f73a0a6426e63fd.

With this commit, however, if i == fail_notes.end() and leaves the while loop,
it will cause segfault/abort in the following if-else when
fail_notes.erase(i) is called.
(I wonder if my patch is right.)

Henry
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html