On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote: >> How responsive generally is the machine under load? Is there available CPU? > The machine works well, and the issued OSDs are likely the same, seems because they have relative slower disk( disk type are the same but the latency is a bit higher ,8ms -> 10ms). > > Top show no idle % but still have 30+% of io_wait, my colleague educate me that io_wait can be treated as free. > > Another information is offload the heartbeat to 1Gb nic doesn't solve the problem, what's more, when we doing random write test, we can still see this flipping happen. So I would like to say it may related with CPU scheduler ? The heartbeat thread (in busy OSD ) failed to get enough cpu cycle. > FWIW, also take a close look at your monitor daemons, and whether they show any signs of being overloaded. I frequently see OSDs wrongly marked down when my mons cannot keep up with their workload. -- Jim -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html