On Wed, 24 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> > >> >> And now it is very weird.... I made osd.37 up, and loop > >> >> while true;do; ceph tell 1.165 query ;done > >> > >> > Here need to explain more - all I did was start ceph-osd id=37 on > >> > storage node, in ceph osd tree this osd osd is marked as out: > >> > >> > >> > -17 21.49995 host stor8 > >> > 22 1.59999 osd.22 up 1.00000 1.00000 > >> > 23 1.59999 osd.23 up 1.00000 1.00000 > >> > 36 2.09999 osd.36 up 1.00000 1.00000 > >> > 37 2.09999 osd.37 up 0 1.00000 > >> > 38 2.50000 osd.38 up 1.00000 1.00000 > >> > 39 2.50000 osd.39 up 1.00000 1.00000 > >> > 40 2.50000 osd.40 up 0 1.00000 > >> > 41 2.50000 osd.41 down 0 1.00000 > >> > 42 2.50000 osd.42 up 1.00000 1.00000 > >> > 43 1.59999 osd.43 up 1.00000 1.00000 > >> > >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command > >> >> catch this: > >> > >> >> https://pastebin.com/zKu06fJn > >> > >> here is for pg 1.60: > >> > >> https://pastebin.com/Xuk5iFXr > > > Look at the bottom, after it says > > > "blocked": "peering is blocked due to down osds", > > > Did the 1.165 pg recover? > > No it didn't: > > [root@cc1 ~]# ceph health detail > HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive > pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48] > pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68] > pg 1.60 is down+remapped+peering, acting [68] > pg 1.165 is incomplete, acting [67,88,48] > [root@cc1 ~]# Hrm. ceph daemon osd.67 config set debug_osd 20 ceph daemon osd.67 config set debug_ms 1 ceph osd down 67 and capture the log resulting log segment, then post it with ceph-post-file. sage