1 PG stuck unclean (active+remapped) after OSD replacement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi experts,

I have a strange situation right now. We are re-organizing our 4 node Hammer cluster from LVM-based OSDs to HDDs. When we did this on the first node last week, everything went smoothly, I removed the OSDs from the crush map and the rebalancing and recovery finished successfully. This weekend we did the same with the second node, we created the HDD-based OSDs and added them to the cluster, waited for rebalancing to finish and then stopped the old OSDs. Only this time the recovery didn't completely finish, 4 PGs kept stuck unclean. I found out that 3 of these 4 PGs had their primary OSD on that node. So I restarted the respective services and those 3 PGs recovered successfully. But there is one last PG that gives me headaches.

ceph@ndesan01:~ # ceph pg map 1.3d3
osdmap e24320 pg 1.3d3 (1.3d3) -> up [16,21] acting [16,21,0]

ceph@ndesan01:~/ceph-deploy> ceph osd tree
ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 9.38985 root default
-2 1.19995     host ndesan01
 0 0.23999         osd.0          up  1.00000          1.00000
 1 0.23999         osd.1          up  1.00000          1.00000
 2 0.23999         osd.2          up  1.00000          1.00000
13 0.23999         osd.13         up  1.00000          1.00000
19 0.23999         osd.19         up  1.00000          1.00000
-3 1.81998     host ndesan02
 3       0         osd.3        down        0          1.00000
 4       0         osd.4        down        0          1.00000
 5       0         osd.5        down        0          1.00000
 9       0         osd.9        down  1.00000          1.00000
10       0         osd.10       down  1.00000          1.00000
 6 0.90999         osd.6          up  1.00000          1.00000
 7 0.90999         osd.7          up  1.00000          1.00000
-4 1.81998     host nde32
20 0.90999         osd.20         up  1.00000          1.00000
21 0.90999         osd.21         up  1.00000          1.00000
-5 4.54994     host ndesan03
14 0.90999         osd.14         up  1.00000          1.00000
15 0.90999         osd.15         up  1.00000          1.00000
16 0.90999         osd.16         up  1.00000          1.00000
17 0.90999         osd.17         up  1.00000          1.00000
18 0.90999         osd.18         up  1.00000          1.00000


All OSDs marked as "down" are going to be removed. I looked for that PG on all 3 nodes, and all of them have it. All services are up and running, but for some reason this PG is not aware of that. Is there any reasonable explanation and/or some advice how to get that PG recovered?

One thing I noticed:

The data on the primary OSD (osd.16) had different timestamps than on the other two OSDs:

---cut here---
ndesan03:~ # ls -rtl /var/lib/ceph/osd/ceph-16/current/1.3d3_head/
total 389436
-rw-r--r-- 1 root root       0 Jul 12  2016 __head_000003D3__1
...
-rw-r--r-- 1 root root 0 Jan 9 10:43 rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1 -rw-r--r-- 1 root root 0 Jan 9 10:43 rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1 -rw-r--r-- 1 root root 0 Jan 9 10:44 rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1 -rw-r--r-- 1 root root 0 Jan 9 10:44 rbd\udata.bca465368d6b49.0000000000000e6a__head_C91813D3__1 -rw-r--r-- 1 root root 8388608 Jan 20 13:53 rbd\udata.cc94344e6afb66.00000000000008cb__head_6AA4B3D3__1 -rw-r--r-- 1 root root 8388608 Jan 20 14:47 rbd\udata.e15aee238e1f29.00000000000005f0__head_C95063D3__1 -rw-r--r-- 1 root root 8388608 Jan 20 15:10 rbd\udata.e15aee238e1f29.0000000000000d15__head_FF1083D3__1 -rw-r--r-- 1 root root 8388608 Jan 20 15:19 rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1 -rw-r--r-- 1 root root 8388608 Jan 23 14:17 rbd\udata.e73cf7b03e0c6.0000000000000479__head_C16003D3__1 -rw-r--r-- 1 root root 8388608 Jan 25 11:52 rbd\udata.d4edc95e884adc.00000000000000f4__head_00EE43D3__1 -rw-r--r-- 1 root root 4194304 Jan 27 08:07 rbd\udata.34595be2237e6.0000000000000ad5__head_D3CC93D3__1 -rw-r--r-- 1 root root 4194304 Jan 27 08:08 rbd\udata.34595be2237e6.0000000000000aff__head_3BF633D3__1 -rw-r--r-- 1 root root 4194304 Jan 27 16:20 rbd\udata.8b61c69f34baf.000000000000876a__head_A60A63D3__1 -rw-r--r-- 1 root root 4194304 Jan 29 17:45 rbd\udata.28fcaf199543c3.0000000000000ae7__head_C1BA53D3__1 -rw-r--r-- 1 root root 4194304 Jan 30 06:33 rbd\udata.28fcaf199543c3.0000000000001832__head_6EC113D3__1 -rw-r--r-- 1 root root 4194304 Jan 31 10:33 rb.0.ddcdf5.238e1f29.0000000000e4__head_3F1543D3__1 -rw-r--r-- 1 root root 4194304 Feb 13 06:14 rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
---cut here---

The other two OSDs have identical timestamps, I just post the (shortened) output of osd.21:

---cut here---
nde32:/var/lib/ceph/osd/ceph-21/current # ls -lrt /var/lib/ceph/osd/ceph-21/current/1.3d3_head/
total 389432
-rw-r--r-- 1 root root       0 Feb  6 15:29 __head_000003D3__1
...
-rw-r--r-- 1 root root 0 Feb 6 16:46 rbd\udata.a00851d652069.00000000000007a4__head_C55DB3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.947feb21a163a2.0000000000004349__head_A37FB3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.8b61c69f34baf.00000000000068cb__head_B4A2C3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.874a620334da.00000000000004ed__head_3835C3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.8b61c69f34baf.0000000000004424__head_5BA7C3D3__1 -rw-r--r-- 1 root root 8388608 Feb 6 16:47 rbd\udata.31a3e57d64476.0000000000000418__head_B158C3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.1128db1b5d2111.00000000000002eb__head_81AAC3D3__1 -rw-r--r-- 1 root root 0 Feb 6 16:47 rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.2d6fe91cf37a46.000000000000019e__head_2346D3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.856071751c29d.0000000000006134__head_C876E3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.949da61c92b32c.0000000000000a18__head_397BE3D3__1 -rw-r--r-- 1 root root 8388608 Feb 6 16:47 rbd\udata.567d57d819eed.000000000000034f__head_FC83F3D3__1 -rw-r--r-- 1 root root 0 Feb 6 16:47 rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1 -rw-r--r-- 1 root root 4194304 Feb 6 16:47 rbd\udata.856071751c29d.0000000000003a2c__head_0684F3D3__1 -rw-r--r-- 1 root root 8388608 Feb 6 16:47 rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1 -rw-r--r-- 1 root root 0 Feb 6 16:47 rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1 -rw-r--r-- 1 root root 4194304 Feb 13 06:14 rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
---cut here---

So I figured that the data on the primary OSD could be the problem and copied the content from one of the other OSDs, restarted all 3 OSDs, but the status didn't change. How can I repair this PG?

Another question about OSD replacement: why didn't the cluster switch the primary OSD for all PGs when the OSDs went down? If this was a real disk failure, I have doubts about a full recovery. Or should I have deleted that PG instead of re-activating old OSDs? I'm not sure what the best practice would be in this case.

Any help is appreciated!

Regards,
Eugen

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux