pg incomplete second osd in acting set still available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

One last dip into my old bobtail cluster.  (new hardware is on order)

I have three pg in an incomplete state.  The cluster was previously
stable but with a health warn state due to a few near full osds.  I
started resizing drives on one host to expand space after taking the
osds that served them out and down.  My failure domain is two levels
osds and hosts and have two copies per placement group.

I have three of my pgs flagging incomplete.

root@d90-b1-1c-3a-c4-8f:~# date; sudo ceph --id nova health detail |
grep incomplete
Fri Mar 25 11:28:47 CDT 2016
HEALTH_WARN 168 pgs backfill; 107 pgs backfilling; 241 pgs degraded; 3
pgs incomplete; 3 pgs stuck inactive; 287 pgs stuck unclean; recovery
4913393/39589336 degraded (12.411%);  recovering 120 o/s, 481MB/s; 4
near full osd(s)
pg 3.5 is stuck inactive since forever, current state incomplete, last
acting [53,22]
pg 3.150 is stuck inactive since forever, current state incomplete, last
acting [50,74]
pg 3.38c is stuck inactive since forever, current state incomplete, last
acting [14,70]
pg 3.5 is stuck unclean since forever, current state incomplete, last
acting [53,22]
pg 3.150 is stuck unclean since forever, current state incomplete, last
acting [50,74]
pg 3.38c is stuck unclean since forever, current state incomplete, last
acting [14,70]
pg 3.38c is incomplete, acting [14,70]
pg 3.150 is incomplete, acting [50,74]
pg 3.5 is incomplete, acting [53,22]

Given that incomplete means:

"Ceph detects that a placement group is missing information about writes
that may have occurred, or does not have any healthy copies. If you see
this state, try to start any failed OSDs that may contain the needed
information or temporarily adjust min_size to allow recovery."

I have restarted all osds in these acting sets and they log normally,
opening their respective journals and such. However, the incomplete
state remains.

All three of the primary osds 53,50,14 have were reformatted to expand
size, so I know there's no "spare" journal if its referring to what was
there before.  Btw, I did take all osds to out and down before resizing
their drives, so I'm not sure how these pg would actually be expecting
old journal.

I suspect I need to forgo the journal and let the secondaries become
primary for these pg.

I sure hope that's possible.

As always, thanks for any pointers.

~jpr

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux