ceph recovery incomplete PGs on Luminous RC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luminous 12.1.0(RC)

I replaced two OSD drives(old ones were still good, just too small), using:

ceph osd out osd.12
ceph osd crush remove osd.12
ceph auth del osd.12
systemctl stop ceph-osd@osd.12
ceph osd rm osd.12

I later found that I also should have unmounted it from /var/lib/ceph/osd-12

(remove old disk, insert new disk)

I added the new disk/osd with ceph-deploy osd prepare stor-vm3:sdg --bluestore

This automatically activated the osd (not sure why, I thought it needed a ceph-deploy osd activate as well)


Then, working on an unrelated issue, I upgraded one (out of 4 total) nodes to 12.1.1 using apt and rebooted. 

The mon daemon would not form a quorum with the others on 12.1.0, so, instead of troubleshooting that, I just went ahead and upgraded the other 3 nodes and rebooted.

Lots of recovery IO went on afterwards, but now things have stopped at:

    pools:   10 pools, 6804 pgs
    objects: 1784k objects, 7132 GB
    usage:   11915 GB used, 19754 GB / 31669 GB avail
    pgs:     0.353% pgs not active
             70894/2988573 objects degraded (2.372%)
             422090/2988573 objects misplaced (14.123%)
             6626 active+clean
             129  active+remapped+backfill_wait
             23   incomplete
             14   active+undersized+degraded+remapped+backfill_wait
             4    active+undersized+degraded+remapped+backfilling
             4    active+remapped+backfilling
             2    active+clean+scrubbing+deep
             1    peering
             1    active+recovery_wait+degraded+remapped


when I run ceph pg query on the incompletes, they all list at least one of the two removed OSDs(12,17) in "down_osds_we_would_probe"

most pools are size:2 min_size 1(trusting bluestore to tell me which one is valid). One pool is size:1 min size:1 and I'm okay with losing it, except I had it mounted in a directory on cephfs, I rm'd the directory but I can't delete the pool because it's "in use by CephFS"


I still have the old drives, can I stick them into another host and re-add them somehow?

This data isn't super important, but I'd like to learn a bit on how to recover when bad things happen as we are planning a production deployment in a couple of weeks.

















_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux