Hi experts,
I have a strange situation right now. We are re-organizing our 4 node
Hammer cluster from LVM-based OSDs to HDDs. When we did this on the
first node last week, everything went smoothly, I removed the OSDs
from the crush map and the rebalancing and recovery finished
successfully.
This weekend we did the same with the second node, we created the
HDD-based OSDs and added them to the cluster, waited for rebalancing
to finish and then stopped the old OSDs. Only this time the recovery
didn't completely finish, 4 PGs kept stuck unclean. I found out that 3
of these 4 PGs had their primary OSD on that node. So I restarted the
respective services and those 3 PGs recovered successfully. But there
is one last PG that gives me headaches.
ceph@ndesan01:~ # ceph pg map 1.3d3
osdmap e24320 pg 1.3d3 (1.3d3) -> up [16,21] acting [16,21,0]
ceph@ndesan01:~/ceph-deploy> ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 9.38985 root default
-2 1.19995 host ndesan01
0 0.23999 osd.0 up 1.00000 1.00000
1 0.23999 osd.1 up 1.00000 1.00000
2 0.23999 osd.2 up 1.00000 1.00000
13 0.23999 osd.13 up 1.00000 1.00000
19 0.23999 osd.19 up 1.00000 1.00000
-3 1.81998 host ndesan02
3 0 osd.3 down 0 1.00000
4 0 osd.4 down 0 1.00000
5 0 osd.5 down 0 1.00000
9 0 osd.9 down 1.00000 1.00000
10 0 osd.10 down 1.00000 1.00000
6 0.90999 osd.6 up 1.00000 1.00000
7 0.90999 osd.7 up 1.00000 1.00000
-4 1.81998 host nde32
20 0.90999 osd.20 up 1.00000 1.00000
21 0.90999 osd.21 up 1.00000 1.00000
-5 4.54994 host ndesan03
14 0.90999 osd.14 up 1.00000 1.00000
15 0.90999 osd.15 up 1.00000 1.00000
16 0.90999 osd.16 up 1.00000 1.00000
17 0.90999 osd.17 up 1.00000 1.00000
18 0.90999 osd.18 up 1.00000 1.00000
All OSDs marked as "down" are going to be removed. I looked for that
PG on all 3 nodes, and all of them have it. All services are up and
running, but for some reason this PG is not aware of that. Is there
any reasonable explanation and/or some advice how to get that PG
recovered?
One thing I noticed:
The data on the primary OSD (osd.16) had different timestamps than on
the other two OSDs:
---cut here---
ndesan03:~ # ls -rtl /var/lib/ceph/osd/ceph-16/current/1.3d3_head/
total 389436
-rw-r--r-- 1 root root 0 Jul 12 2016 __head_000003D3__1
...
-rw-r--r-- 1 root root 0 Jan 9 10:43
rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1
-rw-r--r-- 1 root root 0 Jan 9 10:43
rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1
-rw-r--r-- 1 root root 0 Jan 9 10:44
rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1
-rw-r--r-- 1 root root 0 Jan 9 10:44
rbd\udata.bca465368d6b49.0000000000000e6a__head_C91813D3__1
-rw-r--r-- 1 root root 8388608 Jan 20 13:53
rbd\udata.cc94344e6afb66.00000000000008cb__head_6AA4B3D3__1
-rw-r--r-- 1 root root 8388608 Jan 20 14:47
rbd\udata.e15aee238e1f29.00000000000005f0__head_C95063D3__1
-rw-r--r-- 1 root root 8388608 Jan 20 15:10
rbd\udata.e15aee238e1f29.0000000000000d15__head_FF1083D3__1
-rw-r--r-- 1 root root 8388608 Jan 20 15:19
rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1
-rw-r--r-- 1 root root 8388608 Jan 23 14:17
rbd\udata.e73cf7b03e0c6.0000000000000479__head_C16003D3__1
-rw-r--r-- 1 root root 8388608 Jan 25 11:52
rbd\udata.d4edc95e884adc.00000000000000f4__head_00EE43D3__1
-rw-r--r-- 1 root root 4194304 Jan 27 08:07
rbd\udata.34595be2237e6.0000000000000ad5__head_D3CC93D3__1
-rw-r--r-- 1 root root 4194304 Jan 27 08:08
rbd\udata.34595be2237e6.0000000000000aff__head_3BF633D3__1
-rw-r--r-- 1 root root 4194304 Jan 27 16:20
rbd\udata.8b61c69f34baf.000000000000876a__head_A60A63D3__1
-rw-r--r-- 1 root root 4194304 Jan 29 17:45
rbd\udata.28fcaf199543c3.0000000000000ae7__head_C1BA53D3__1
-rw-r--r-- 1 root root 4194304 Jan 30 06:33
rbd\udata.28fcaf199543c3.0000000000001832__head_6EC113D3__1
-rw-r--r-- 1 root root 4194304 Jan 31 10:33
rb.0.ddcdf5.238e1f29.0000000000e4__head_3F1543D3__1
-rw-r--r-- 1 root root 4194304 Feb 13 06:14
rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
---cut here---
The other two OSDs have identical timestamps, I just post the
(shortened) output of osd.21:
---cut here---
nde32:/var/lib/ceph/osd/ceph-21/current # ls -lrt
/var/lib/ceph/osd/ceph-21/current/1.3d3_head/
total 389432
-rw-r--r-- 1 root root 0 Feb 6 15:29 __head_000003D3__1
...
-rw-r--r-- 1 root root 0 Feb 6 16:46
rbd\udata.a00851d652069.00000000000007a4__head_C55DB3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.947feb21a163a2.0000000000004349__head_A37FB3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.8b61c69f34baf.00000000000068cb__head_B4A2C3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.874a620334da.00000000000004ed__head_3835C3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.8b61c69f34baf.0000000000004424__head_5BA7C3D3__1
-rw-r--r-- 1 root root 8388608 Feb 6 16:47
rbd\udata.31a3e57d64476.0000000000000418__head_B158C3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.1128db1b5d2111.00000000000002eb__head_81AAC3D3__1
-rw-r--r-- 1 root root 0 Feb 6 16:47
rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.2d6fe91cf37a46.000000000000019e__head_2346D3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.856071751c29d.0000000000006134__head_C876E3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.949da61c92b32c.0000000000000a18__head_397BE3D3__1
-rw-r--r-- 1 root root 8388608 Feb 6 16:47
rbd\udata.567d57d819eed.000000000000034f__head_FC83F3D3__1
-rw-r--r-- 1 root root 0 Feb 6 16:47
rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1
-rw-r--r-- 1 root root 4194304 Feb 6 16:47
rbd\udata.856071751c29d.0000000000003a2c__head_0684F3D3__1
-rw-r--r-- 1 root root 8388608 Feb 6 16:47
rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1
-rw-r--r-- 1 root root 0 Feb 6 16:47
rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1
-rw-r--r-- 1 root root 4194304 Feb 13 06:14
rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
---cut here---
So I figured that the data on the primary OSD could be the problem and
copied the content from one of the other OSDs, restarted all 3 OSDs,
but the status didn't change. How can I repair this PG?
Another question about OSD replacement: why didn't the cluster switch
the primary OSD for all PGs when the OSDs went down? If this was a
real disk failure, I have doubts about a full recovery. Or should I
have deleted that PG instead of re-activating old OSDs? I'm not sure
what the best practice would be in this case.
Any help is appreciated!
Regards,
Eugen
--
Eugen Block voice : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail : eblock@xxxxxx
Vorsitzende des Aufsichtsrates: Angelika Mozdzen
Sitz und Registergericht: Hamburg, HRB 90934
Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com