Re: peering PGs

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 26 Jul 2011 08:00:43 -0700 (PDT)

On Tue, 26 Jul 2011, Christian Brunner wrote:
> OK  I've solved this by myself.
> 
> Since I knew that ther is replication between
> 
> osd001 and osd005,
> 
> as well as
> 
> osd001 and osd015,
> osd001 and osd012,
> 
> I decided to take osd005, osd012 and osd015 offline. After that ceph
> started to rebuild the PGs on other nodes.

At the same time you mean?  Or just restarted them?

The usual way to debug these situations is:

 - identify a stuck pg
 - figure out what osds it maps to.  [15,1]
 - turn on logs on those nodes:
    ceph osd tell 15 injectargs '--debug-osd 20 --debug-ms 1'
    ceph osd tell 1 injectargs '--debug-osd 20 --debug-ms 1'
 - restart peering by togging the primary (first osd, 15)
    ceph osd down 15
 - send us the resulting logs (for all nodes)

Even better if you also include other (old) osds that include pg data 
(osd1 in your case) in this.

We definitely want to fix the core issue, so any help gathering the logs 
would be appreciated!  It's also possible that the above will 'fix' it 
because the peering issue is hard to hit.  In that case, cranking up the 
debug level after the initial crash but before you restart everything 
might be a good idea.

Thanks!
sage

> 
> Everything is fine now.
> 
> Regards,
> Christian
> 
> 2011/7/26 Christian Brunner <chb@xxxxxx>:
> > Another kernel crash another invalid ceph state...
> >
> > A memory allocation failure in the kernel (ixgbe) of one OSD-Server
> > lead to a domino effect in our ceph cluster with "0 up, 0 in".
> >
> > When I restarted the cluster everything came up again. But I still
> > have 6 peering PGs:
> >
> > pg v5898472: 3712 pgs: 3706 active+clean, 6 peering; 745 GB data, 775
> > GB used, 57642 GB / 59615 GB avail
> >
> > # ceph pg dump -o - | grep peering
> > 0.190   22      0       0       0       90112   92274688        200
> >  200     peering 6500'1256       7167'1063       [15,1]  [15,1]
> > 6500'1256       2011-07-22 11:22:55.798745
> > 3.18d   385     0       0       0       1529498 1566204928      300
> >  300     peering 7013'134376     7167'20162      [15,1]  [15,1]
> > 6933'132427     2011-07-22 11:22:56.488471
> > 0.4c    9       0       0       0       36864   37748736        200
> >  200     peering 6500'673        7163'1095       [12,1]  [12,1]
> > 6500'673        2011-07-22 11:22:20.226119
> > 3.49    171     0       0       0       671467  687580272       295
> >  295     peering 7013'10276      7163'2879       [12,1]  [12,1]
> > 6933'9455       2011-07-22 11:22:20.701854
> > 0.35e   6       0       0       0       24576   25165824        200
> >  200     peering 6500'628        7163'1142       [12,1]  [12,1]
> > 6500'628        2011-07-22 11:22:19.267804
> > 3.35b   198     0       0       0       791800  810803200       297
> >  297     peering 7013'66727      7163'5759       [12,1]  [12,1]
> > 6933'65715      2011-07-22 11:22:20.035265
> >
> >
> > "ceph pg map" is consistent with "ceph pg dump":
> >
> > # ceph pg map 0.190
> > 2011-07-26 08:46:19.330981 mon <- [pg,map,0.190]
> > 2011-07-26 08:46:19.331981 mon1 -> 'osdmap e7273 pg 0.190 (0.190) ->
> > up [15,1] acting [15,1]' (0)
> >
> >
> > But directorys of the PGs are present on multiple nodes (for example
> > on osd005 for 0.190):
> >
> > /ceph/osd.001/current/0.190_head
> > /ceph/osd.001/snap_1650435/0.190_head
> > /ceph/osd.001/snap_1650445/0.190_head
> > /ceph/osd.005/current/0.190_head
> > /ceph/osd.005/snap_1572317/0.190_head
> > /ceph/osd.005/snap_1572323/0.190_head
> > /ceph/osd.015/current/0.190_head
> > /ceph/osd.015/snap_1467152/0.190_head
> >
> > Any hint on how to proceed yould be great.
> >
> > Thanks,
> > Christian
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>