On Wed, Sep 22, 2010 at 8:29 AM, Henry C Chang <henry_c_chang@xxxxxxxxxxxxxxxxxxx> wrote: > 1. http://github.com/tcloud/ceph/commit/3c91d0c0e20425f94eaa374f39b2a1f398270255 > > This fixed a monitor bug we hit several times. When some osds report > that other osds are down, > there is a chance that the leading monitor would enter an infinite loop (you can > see the CPU usage of cmon is 100% from "top"). You diagnosed the problem correctly, but it looks like your patch can make the monitor discard the wrong report! Fixed this by sticking an "&& i != fail_notes.end()" into the while loop, in commit:2c5a3d99aa3be5ce114072e84f73a0a6426e63fd. > ps. I found that the master branch does not contain this piece of > code. Is there any concern? Counting of failure notes, and the ability to discard them, is a new feature to help prevent flapping that didn't get into .21 but will go into the next release. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html