Re: peering PGs

Christian Brunner <chb@xxxxxx> · Tue, 26 Jul 2011 10:06:25 +0200

OK  I've solved this by myself.

Since I knew that ther is replication between

osd001 and osd005,

as well as

osd001 and osd015,
osd001 and osd012,

I decided to take osd005, osd012 and osd015 offline. After that ceph
started to rebuild the PGs on other nodes.

Everything is fine now.

Regards,
Christian

2011/7/26 Christian Brunner <chb@xxxxxx>:
> Another kernel crash another invalid ceph state...
>
> A memory allocation failure in the kernel (ixgbe) of one OSD-Server
> lead to a domino effect in our ceph cluster with "0 up, 0 in".
>
> When I restarted the cluster everything came up again. But I still
> have 6 peering PGs:
>
> pg v5898472: 3712 pgs: 3706 active+clean, 6 peering; 745 GB data, 775
> GB used, 57642 GB / 59615 GB avail
>
> # ceph pg dump -o - | grep peering
> 0.190   22      0       0       0       90112   92274688        200
>  200     peering 6500'1256       7167'1063       [15,1]  [15,1]
> 6500'1256       2011-07-22 11:22:55.798745
> 3.18d   385     0       0       0       1529498 1566204928      300
>  300     peering 7013'134376     7167'20162      [15,1]  [15,1]
> 6933'132427     2011-07-22 11:22:56.488471
> 0.4c    9       0       0       0       36864   37748736        200
>  200     peering 6500'673        7163'1095       [12,1]  [12,1]
> 6500'673        2011-07-22 11:22:20.226119
> 3.49    171     0       0       0       671467  687580272       295
>  295     peering 7013'10276      7163'2879       [12,1]  [12,1]
> 6933'9455       2011-07-22 11:22:20.701854
> 0.35e   6       0       0       0       24576   25165824        200
>  200     peering 6500'628        7163'1142       [12,1]  [12,1]
> 6500'628        2011-07-22 11:22:19.267804
> 3.35b   198     0       0       0       791800  810803200       297
>  297     peering 7013'66727      7163'5759       [12,1]  [12,1]
> 6933'65715      2011-07-22 11:22:20.035265
>
>
> "ceph pg map" is consistent with "ceph pg dump":
>
> # ceph pg map 0.190
> 2011-07-26 08:46:19.330981 mon <- [pg,map,0.190]
> 2011-07-26 08:46:19.331981 mon1 -> 'osdmap e7273 pg 0.190 (0.190) ->
> up [15,1] acting [15,1]' (0)
>
>
> But directorys of the PGs are present on multiple nodes (for example
> on osd005 for 0.190):
>
> /ceph/osd.001/current/0.190_head
> /ceph/osd.001/snap_1650435/0.190_head
> /ceph/osd.001/snap_1650445/0.190_head
> /ceph/osd.005/current/0.190_head
> /ceph/osd.005/snap_1572317/0.190_head
> /ceph/osd.005/snap_1572323/0.190_head
> /ceph/osd.015/current/0.190_head
> /ceph/osd.015/snap_1467152/0.190_head
>
> Any hint on how to proceed yould be great.
>
> Thanks,
> Christian
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html