I have an interesting problem. For a few weeks, we've been testing Luminous in a cluster made up of 8 servers and with about 20 SSD disks almost evenly distributed. It is running erasure coding.
Yesterday, we decided to bring the cluster to a minimum of 8 servers and 1 disk per server.
So, we went ahead and removed the additional disks from the ceph cluster, by executing commands like this from the admin server:
$ ceph osd purge osd.20 --yes-i-really-mean-it
Error EBUSY: osd.20 is not `down`.
So I logged in to the host it resides on and killed it systemctl stop ceph-osd@26
$ ceph osd purge osd.20 --yes-i-really-mean-it
We waited for the cluster to be healthy once again and I physically removed the disks (hot swap, connected to an LSI 3008 controller). A few minutes after that, I needed to turn off one of the OSD servers to
swap out a piece of hardware inside. So, I issued:
And proceeded to turn off that 1 OSD server.
But the interesting thing happened then. Once that 1 server came back up, the cluster all of a sudden showed that out of the 8 nodes, only 2 were up!
Even more interesting is that it seems Ceph, in each OSD server, still thinks the missing disks are there!
When I start ceph on each OSD server with "systemctl start ceph-osd.target", /var/logs/ceph gets
filled with logs for disks that are not supposed to exist anymore.
The contents of the logs show something like:
# cat /var/log/ceph/ceph-osd.7.log
2017-10-20 08:45:16.389432 7f8ee6e36d00 0 set uid:gid to 167:167 (ceph:ceph)
2017-10-20 08:45:16.389449 7f8ee6e36d00 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 2591
2017-10-20 08:45:16.389639 7f8ee6e36d00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7:
(2) No such file or directory
2017-10-20 08:45:36.639439 7fb389277d00 0 set uid:gid to 167:167 (ceph:ceph)
The actual Ceph cluster sees only 8 disks, as you can see here:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-3 1.86469 host ceph-las1-a1-osd
1 ssd 1.86469 osd.1 down 0 1.00000
-5 0.87320 host ceph-las1-a2-osd
2 ssd 0.87320 osd.2 down 0 1.00000
-7 0.87320 host ceph-las1-a3-osd
4 ssd 0.87320 osd.4 down 1.00000 1.00000
-9 0.87320 host ceph-las1-a4-osd
8 ssd 0.87320 osd.8 up 1.00000 1.00000
-11 0.87320 host ceph-las1-a5-osd
12 ssd 0.87320 osd.12 down 1.00000 1.00000
-13 0.87320 host ceph-las1-a6-osd
17 ssd 0.87320 osd.17 up 1.00000 1.00000
-15 0.87320 host ceph-las1-a7-osd
21 ssd 0.87320 osd.21 down 1.00000 1.00000
-17 0.87000 host ceph-las1-a8-osd
28 ssd 0.87000 osd.28 down 0 1.00000
Linux, in the OSD servers, seems to also think the disks are in:
Filesystem Size Used Avail Use% Mounted on
/dev/sde2 976M 183M 727M 21% /boot
/dev/sdd1 97M 5.4M 92M 6% /var/lib/ceph/osd/ceph-7
/dev/sdc1 97M 5.4M 92M 6% /var/lib/ceph/osd/ceph-6
/dev/sda1 97M 5.4M 92M 6% /var/lib/ceph/osd/ceph-4
/dev/sdb1 97M 5.4M 92M 6% /var/lib/ceph/osd/ceph-5
tmpfs 6.3G 0 6.3G 0% /run/user/0
It should show only one disk, not 4.
I tried to issue again the commands to remove the disks, this time, in the OSD server itself:
$ ceph osd purge osd.X --yes-i-really-mean-it
Yet, if I again issue "systemctl start ceph-osd.target", /var/log/ceph again shows logs for a disk that does
not exist (to make sure, I deleted all logs prior).
So, it seems, somewhere, Ceph in the OSD still thinks there should be more disks?
The Ceph cluster is unusable though. We've tried everything to bring it back again. But as Dr. Bones would say, it's dead Jim.