OK I've solved this by myself. Since I knew that ther is replication between osd001 and osd005, as well as osd001 and osd015, osd001 and osd012, I decided to take osd005, osd012 and osd015 offline. After that ceph started to rebuild the PGs on other nodes. Everything is fine now. Regards, Christian 2011/7/26 Christian Brunner <chb@xxxxxx>: > Another kernel crash another invalid ceph state... > > A memory allocation failure in the kernel (ixgbe) of one OSD-Server > lead to a domino effect in our ceph cluster with "0 up, 0 in". > > When I restarted the cluster everything came up again. But I still > have 6 peering PGs: > > pg v5898472: 3712 pgs: 3706 active+clean, 6 peering; 745 GB data, 775 > GB used, 57642 GB / 59615 GB avail > > # ceph pg dump -o - | grep peering > 0.190 22 0 0 0 90112 92274688 200 > 200 peering 6500'1256 7167'1063 [15,1] [15,1] > 6500'1256 2011-07-22 11:22:55.798745 > 3.18d 385 0 0 0 1529498 1566204928 300 > 300 peering 7013'134376 7167'20162 [15,1] [15,1] > 6933'132427 2011-07-22 11:22:56.488471 > 0.4c 9 0 0 0 36864 37748736 200 > 200 peering 6500'673 7163'1095 [12,1] [12,1] > 6500'673 2011-07-22 11:22:20.226119 > 3.49 171 0 0 0 671467 687580272 295 > 295 peering 7013'10276 7163'2879 [12,1] [12,1] > 6933'9455 2011-07-22 11:22:20.701854 > 0.35e 6 0 0 0 24576 25165824 200 > 200 peering 6500'628 7163'1142 [12,1] [12,1] > 6500'628 2011-07-22 11:22:19.267804 > 3.35b 198 0 0 0 791800 810803200 297 > 297 peering 7013'66727 7163'5759 [12,1] [12,1] > 6933'65715 2011-07-22 11:22:20.035265 > > > "ceph pg map" is consistent with "ceph pg dump": > > # ceph pg map 0.190 > 2011-07-26 08:46:19.330981 mon <- [pg,map,0.190] > 2011-07-26 08:46:19.331981 mon1 -> 'osdmap e7273 pg 0.190 (0.190) -> > up [15,1] acting [15,1]' (0) > > > But directorys of the PGs are present on multiple nodes (for example > on osd005 for 0.190): > > /ceph/osd.001/current/0.190_head > /ceph/osd.001/snap_1650435/0.190_head > /ceph/osd.001/snap_1650445/0.190_head > /ceph/osd.005/current/0.190_head > /ceph/osd.005/snap_1572317/0.190_head > /ceph/osd.005/snap_1572323/0.190_head > /ceph/osd.015/current/0.190_head > /ceph/osd.015/snap_1467152/0.190_head > > Any hint on how to proceed yould be great. > > Thanks, > Christian > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html