Re: 1 PG stuck unclean (active+remapped) after OSD replacement

Wido den Hollander <wido@xxxxxxxx> · Mon, 13 Feb 2017 16:05:18 +0100 (CET)

> Op 13 februari 2017 om 16:03 schreef Eugen Block <eblock@xxxxxx>:
> 
> 
> Hi experts,
> 
> I have a strange situation right now. We are re-organizing our 4 node  
> Hammer cluster from LVM-based OSDs to HDDs. When we did this on the  
> first node last week, everything went smoothly, I removed the OSDs  
> from the crush map and the rebalancing and recovery finished  
> successfully.
> This weekend we did the same with the second node, we created the  
> HDD-based OSDs and added them to the cluster, waited for rebalancing  
> to finish and then stopped the old OSDs. Only this time the recovery  
> didn't completely finish, 4 PGs kept stuck unclean. I found out that 3  
> of these 4 PGs had their primary OSD on that node. So I restarted the  
> respective services and those 3 PGs recovered successfully. But there  
> is one last PG that gives me headaches.
> 
> ceph@ndesan01:~ # ceph pg map 1.3d3
> osdmap e24320 pg 1.3d3 (1.3d3) -> up [16,21] acting [16,21,0]
> 

What version of Ceph? And could it be that the cluster has old CRUSH tunables? When was it installed with which Ceph version?

Wido

> ceph@ndesan01:~/ceph-deploy> ceph osd tree
> ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.38985 root default
> -2 1.19995     host ndesan01
>   0 0.23999         osd.0          up  1.00000          1.00000
>   1 0.23999         osd.1          up  1.00000          1.00000
>   2 0.23999         osd.2          up  1.00000          1.00000
> 13 0.23999         osd.13         up  1.00000          1.00000
> 19 0.23999         osd.19         up  1.00000          1.00000
> -3 1.81998     host ndesan02
>   3       0         osd.3        down        0          1.00000
>   4       0         osd.4        down        0          1.00000
>   5       0         osd.5        down        0          1.00000
>   9       0         osd.9        down  1.00000          1.00000
> 10       0         osd.10       down  1.00000          1.00000
>   6 0.90999         osd.6          up  1.00000          1.00000
>   7 0.90999         osd.7          up  1.00000          1.00000
> -4 1.81998     host nde32
> 20 0.90999         osd.20         up  1.00000          1.00000
> 21 0.90999         osd.21         up  1.00000          1.00000
> -5 4.54994     host ndesan03
> 14 0.90999         osd.14         up  1.00000          1.00000
> 15 0.90999         osd.15         up  1.00000          1.00000
> 16 0.90999         osd.16         up  1.00000          1.00000
> 17 0.90999         osd.17         up  1.00000          1.00000
> 18 0.90999         osd.18         up  1.00000          1.00000
> 
> 
> All OSDs marked as "down" are going to be removed. I looked for that  
> PG on all 3 nodes, and all of them have it. All services are up and  
> running, but for some reason this PG is not aware of that. Is there  
> any reasonable explanation and/or some advice how to get that PG  
> recovered?
> 
> One thing I noticed:
> 
> The data on the primary OSD (osd.16) had different timestamps than on  
> the other two OSDs:
> 
> ---cut here---
> ndesan03:~ # ls -rtl /var/lib/ceph/osd/ceph-16/current/1.3d3_head/
> total 389436
> -rw-r--r-- 1 root root       0 Jul 12  2016 __head_000003D3__1
> ...
> -rw-r--r-- 1 root root       0 Jan  9 10:43  
> rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1
> -rw-r--r-- 1 root root       0 Jan  9 10:43  
> rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1
> -rw-r--r-- 1 root root       0 Jan  9 10:44  
> rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1
> -rw-r--r-- 1 root root       0 Jan  9 10:44  
> rbd\udata.bca465368d6b49.0000000000000e6a__head_C91813D3__1
> -rw-r--r-- 1 root root 8388608 Jan 20 13:53  
> rbd\udata.cc94344e6afb66.00000000000008cb__head_6AA4B3D3__1
> -rw-r--r-- 1 root root 8388608 Jan 20 14:47  
> rbd\udata.e15aee238e1f29.00000000000005f0__head_C95063D3__1
> -rw-r--r-- 1 root root 8388608 Jan 20 15:10  
> rbd\udata.e15aee238e1f29.0000000000000d15__head_FF1083D3__1
> -rw-r--r-- 1 root root 8388608 Jan 20 15:19  
> rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1
> -rw-r--r-- 1 root root 8388608 Jan 23 14:17  
> rbd\udata.e73cf7b03e0c6.0000000000000479__head_C16003D3__1
> -rw-r--r-- 1 root root 8388608 Jan 25 11:52  
> rbd\udata.d4edc95e884adc.00000000000000f4__head_00EE43D3__1
> -rw-r--r-- 1 root root 4194304 Jan 27 08:07  
> rbd\udata.34595be2237e6.0000000000000ad5__head_D3CC93D3__1
> -rw-r--r-- 1 root root 4194304 Jan 27 08:08  
> rbd\udata.34595be2237e6.0000000000000aff__head_3BF633D3__1
> -rw-r--r-- 1 root root 4194304 Jan 27 16:20  
> rbd\udata.8b61c69f34baf.000000000000876a__head_A60A63D3__1
> -rw-r--r-- 1 root root 4194304 Jan 29 17:45  
> rbd\udata.28fcaf199543c3.0000000000000ae7__head_C1BA53D3__1
> -rw-r--r-- 1 root root 4194304 Jan 30 06:33  
> rbd\udata.28fcaf199543c3.0000000000001832__head_6EC113D3__1
> -rw-r--r-- 1 root root 4194304 Jan 31 10:33  
> rb.0.ddcdf5.238e1f29.0000000000e4__head_3F1543D3__1
> -rw-r--r-- 1 root root 4194304 Feb 13 06:14  
> rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
> ---cut here---
> 
> The other two OSDs have identical timestamps, I just post the  
> (shortened) output of osd.21:
> 
> ---cut here---
> nde32:/var/lib/ceph/osd/ceph-21/current # ls -lrt  
> /var/lib/ceph/osd/ceph-21/current/1.3d3_head/
> total 389432
> -rw-r--r-- 1 root root       0 Feb  6 15:29 __head_000003D3__1
> ...
> -rw-r--r-- 1 root root       0 Feb  6 16:46  
> rbd\udata.a00851d652069.00000000000007a4__head_C55DB3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.947feb21a163a2.0000000000004349__head_A37FB3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.8b61c69f34baf.00000000000068cb__head_B4A2C3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.874a620334da.00000000000004ed__head_3835C3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.8b61c69f34baf.0000000000004424__head_5BA7C3D3__1
> -rw-r--r-- 1 root root 8388608 Feb  6 16:47  
> rbd\udata.31a3e57d64476.0000000000000418__head_B158C3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.1128db1b5d2111.00000000000002eb__head_81AAC3D3__1
> -rw-r--r-- 1 root root       0 Feb  6 16:47  
> rbd\udata.bca465368d6b49.0000000000000e2c__head_00F2D3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.2d6fe91cf37a46.000000000000019e__head_2346D3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.856071751c29d.0000000000006134__head_C876E3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.949da61c92b32c.0000000000000a18__head_397BE3D3__1
> -rw-r--r-- 1 root root 8388608 Feb  6 16:47  
> rbd\udata.567d57d819eed.000000000000034f__head_FC83F3D3__1
> -rw-r--r-- 1 root root       0 Feb  6 16:47  
> rbd\udata.bca465368d6b49.0000000000000a8b__head_A014F3D3__1
> -rw-r--r-- 1 root root 4194304 Feb  6 16:47  
> rbd\udata.856071751c29d.0000000000003a2c__head_0684F3D3__1
> -rw-r--r-- 1 root root 8388608 Feb  6 16:47  
> rbd\udata.e15aee238e1f29.000000000000100c__head_6B17F3D3__1
> -rw-r--r-- 1 root root       0 Feb  6 16:47  
> rbd\udata.bca465368d6b49.0000000000000a06__head_20EFF3D3__1
> -rw-r--r-- 1 root root 4194304 Feb 13 06:14  
> rbd\udata.856071751c29d.000000000000617b__head_E1E4A3D3__1
> ---cut here---
> 
> So I figured that the data on the primary OSD could be the problem and  
> copied the content from one of the other OSDs, restarted all 3 OSDs,  
> but the status didn't change. How can I repair this PG?
> 
> Another question about OSD replacement: why didn't the cluster switch  
> the primary OSD for all PGs when the OSDs went down? If this was a  
> real disk failure, I have doubts about a full recovery. Or should I  
> have deleted that PG instead of re-activating old OSDs? I'm not sure  
> what the best practice would be in this case.
> 
> Any help is appreciated!
> 
> Regards,
> Eugen
> 
> -- 
> Eugen Block                             voice   : +49-40-559 51 75
> NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
> Postfach 61 03 15
> D-22423 Hamburg                         e-mail  : eblock@xxxxxx
> 
>          Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>            Sitz und Registergericht: Hamburg, HRB 90934
>                    Vorstand: Jens-U. Mozdzen
>                     USt-IdNr. DE 814 013 983
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com