Accidentally Remove OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph experts,

I'm a very new Ceph user. I made a blunder that I removed some OSDs (and all files in the related directories) before Ceph finished rebalancing datas and migrating pgs.

Not to mention the data loss, I meet the problem that:

1) There are always stale pgs showing in ceph status (with heath warning). Say one of the stale pg 17.a2:
# ceph -v
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)

# ceph -s
    cluster 3f81b47e-fb15-4fbb-9fee-0b1986dfd7ea
     health HEALTH_WARN 203 pgs degraded; 366 pgs stale; 203 pgs stuck degraded; 366 pgs stuck stale; 203 pgs stuck unclean; 203 pgs stuck undersized; 203 pgs undersized; 154 requests are blocked > 32 sec; recovery 153738/18991802 objects degraded (0.809%)
     monmap e1: 1 mons at {...=...:6789/0}, election epoch 1, quorum 0 tw-ceph01
     osdmap e3697: 12 osds: 12 up, 12 in
      pgmap v21296531: 1156 pgs, 18 pools, 36929 GB data, 9273 kobjects
            72068 GB used, 409 TB / 480 TB avail
            153738/18991802 objects degraded (0.809%)
                 163 stale+active+clean
                 786 active+clean
                 203 stale+active+undersized+degraded
                   4 active+clean+scrubbing+deep


# ceph pg dump_stuck stale | grep 17.a2
17.a2   0       0       0       0       0       0       0       0       stale+active+clean      2015-04-20 09:16:11.624952     0'0     2718:200        [15,17] 15      [15,17] 15      0'0     2015-04-15 10:42:37.880699    0'0      2015-04-15 10:42:37.880699

# ceph pg repair 17.a2
Error EAGAIN: pg 17.a2 primary osd.15 not up

# ceph pg scrub 17.a2
Error EAGAIN: pg 17.a2 primary osd.15 not up

# ceph pg map 17.a2
osdmap e3695 pg 17.a2 (17.a2) -> up [27,3] acting [27,3]

where osd.15 had already been removed. It seems to map to the existing OSDs ([27, 3]).
Can this pg finally get recovered by changing to the existing OSDs? If not, how can I do about this kind of stale pg?

2) I tried to solve the problem above by creating OSDs back but failed. The reason was I cannot create an OSD with the same ID to that I removed, say osd.15 (or change the id of an OSD).
Is there any way to change the id of an OSD? (By the way, I'm suprised that this issue can hardly be found on the internet.)

3) I tried another thing: to dump the crushmap and remove everything (including devices and buckets sections) related to the OSDs I removed. However, after I set the crushmap and dumped it out again, I found the OSDs's line still appear in the devices section (not in the buckets section though), such as:
# devices
device 0 osd.0
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 device5
...
device 14 device14
device 15 device15


Is there anyway to remove them? Does it matters when I want to add new OSDs?

Please inform me if you have any comments. Thank you.

Best Regards,
FaHui

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux