The messenger errors probably indicate that that OSD's peers are down. The boost errors are a result of the OSD receiving a log message in the GetInfo state. This indicates a bug in the peering state machine. Is there a way that you could get us more complete logs? I need an idea of what happened to cause the erroneous log message. Thanks! -Sam On 07/11/2011 07:26 AM, huang jun wrote:
hi,all I use ceph v0.30 on 31osds, on linux 2.6.37 after i set up the whole cluster, there are many (10) osds going down because the cosd process was killed, and we can provide the osd log in attach file "osd-failed". and this phenomenon occured once a week ago.At first we fixed it by just rebuilding the cluster, but this time we will not try that method. we want to find where lead this failed happen. why did the simplemessenger always send RETSETSESSION? whar lead the boost:recovery failed ? can you give some constructive advices? thanks in advance
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html