Re: Problem with query and any operation on PGs

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 24 May 2017 18:16:31 +0000 (UTC)

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> 
> >> >> And  now  it  is very weird.... I made osd.37 up, and loop
> >> >> while true;do; ceph tell 1.165 query ;done
> >> 
> >> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
> >> > storage node, in ceph osd tree this osd osd is marked as out:
> >> 
> >> 
> >> > -17  21.49995     host stor8
> >> >  22   1.59999         osd.22            up  1.00000          1.00000 
> >> >  23   1.59999         osd.23            up  1.00000          1.00000 
> >> >  36   2.09999         osd.36            up  1.00000          1.00000 
> >> >  37   2.09999         osd.37            up        0          1.00000 
> >> >  38   2.50000         osd.38            up  1.00000          1.00000 
> >> >  39   2.50000         osd.39            up  1.00000          1.00000 
> >> >  40   2.50000         osd.40            up        0          1.00000 
> >> >  41   2.50000         osd.41          down        0          1.00000 
> >> >  42   2.50000         osd.42            up  1.00000          1.00000 
> >> >  43   1.59999         osd.43            up  1.00000          1.00000
> >> 
> >> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
> >> >> catch this:
> >> 
> >> >> https://pastebin.com/zKu06fJn
> >> 
> >> here is for pg 1.60:
> >> 
> >> https://pastebin.com/Xuk5iFXr
> 
> > Look at the bottom, after it says
> 
> >             "blocked": "peering is blocked due to down osds",
> 
> > Did the 1.165 pg recover?
> 
> No it didn't:
> 
> [root@cc1 ~]# ceph health detail
> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
> pg 1.60 is down+remapped+peering, acting [68]
> pg 1.165 is incomplete, acting [67,88,48]
> [root@cc1 ~]#

Hrm.

 ceph daemon osd.67 config set debug_osd 20
 ceph daemon osd.67 config set debug_ms 1
 ceph osd down 67

and capture the log resulting log segment, then post it with 
ceph-post-file.

sage