Re: [Ceph-community] Interesting problem: 2 pgs stuck in EC pool with missing OSDs

Patrick McGarry <pmcgarry@xxxxxxxxxx> · Mon, 6 Apr 2015 10:48:17 -0400

moving this to ceph-user where it needs to be for eyeballs and responses. :)

On Mon, Apr 6, 2015 at 1:34 AM, Paul Evans <paul@xxxxxxxxxxxx> wrote:
> Hello Ceph Community & thanks to anyone with advice on this interesting
> situation...
> ========================================================
> The Problem: we have 2 pgs out of 6144 that are still stuck in an
> active+remapped state, and we would like to know if there is a targeted &
> specific way to fix this issue (other than just forcing data to re-sort in
> the cluster in a generic re-shuffle).
>
> Background: We initially created a cluster of 6 ceph nodes with an EC pool &
> profile where k=6 and m=2, but missed the configuration item
> "ruleset-failure-domain=host" in the EC_profile (thinking it defaulted to
> =osd). While the ceph cluster allowed us to create the pool and store data
> in it,  the fact that we had an EC data spread designed for 8 targets
> (k+m=8), but only had 6 targets (our 6 nodes)  eventually caught up to us
> and we ended up with a number of pgs missing chunks of data.  Fortunately,
> the data remained 'relatively protected' because ceph remapped the missing
> chunks to alternate hosts, but (of course) that left the pgs in an
> active+remapped state and no way to solve the puzzle.
> The fix? Easy enough: add two more nodes, which we did and *almost* all the
> pgs re-distributed the data appropriately. Except for 4 pgs. Why 4? We're
> not sure what was unique about those four, but we were able to reduce the
> problem pgs  to just 2 (as stated in our Problem section) by doing the
> following:
>
> executed 'ceph pg repair xx.xxx, but nothing happens. After an hour...
> executed 'ceph pg dump_stuck' and noted that 2 of the 4 pgs had a primary
> OSD of 29.
> executed 'ceph osd set noout' and 'sudo restart ceph-osd id=29
> observed that the osd restart caused a minor shuffle of data, and actually
> left us with the same 4 pgs remapped PLUS 25 pgs stuck 'peering'  (and btw:
> not active).
> After a couple of hours of waiting to see if the peering issues would
> resolve (they didn't), we moved an 'extra' OSD out of the root holding the
> EC pool, which kicked off a significant shuffle of data and ended up with
> everything good again and only 2 pgs active+remapped.  Which two?
> Ironically, even though we were attempting to fix the two pgs that had OSD
> 29 as their primary by way of our osd restart attempt,only one of them
> repaired itself...leaving one pg still having osd.29 as it's primary.
>
> Where Things Stand Now:   we  have 2 pgs that are missing an appropriate
> OSD, and are currently remapped. Here is the (shortened) output of the pg
> queries:
>
> pg_stat    objects     state                       v
> reported              up
> up_primary    acting            acting_primary
> 10.28a    1488    active+remapped     8145'3499    49904:63527
> [64,73,0,32,3,59,2147483647,61]      64    [64,73,0,32,3,59,31,61]    64
> 10.439    1455    active+remapped     8145'3423    49904:62378
> [29,75,63,64,78,7,2147483647,60]    29    [29,75,63,64,78,7,8,60]    29
>
>
> Our question is relatively simple (we think): how does one get a pg that is
> built using an EC_profile to fill in a missing OSD in it's 'up' definition?
> Neither 'ceph pg repair' or 'ceph osd repair' resolved the situation for us,
> and just randomly forcing re-shuffles of data seems haphazard at best.
>
> So...Does any one have a more targeted suggestion?  If so - thanks!
>
>
> Paul Evans
>
> Principal Architect - Daystrom Technology Group
>
>
> ----------------------------
>
>
> Two more notes:
>
> 1) we have many of these 'fault' messages in the logs and don't know if they
> are related in some way (172.16.x.x is the cluster back-end network):
>
> 2015-04-05 20:09:33.362107 7f6ac06ce700  0 -- 172.16.1.5:6839/8638 >>
> 172.16.1.7:6810/2370 pipe(0x6749a580 sd=116 :6839 s=2 pgs=77 cs=1 l=0
> c=0x2f4851e0).fault with nothing to send, going to standby
>
> 2)  here is the ceph osd tree and ceph -s output:
>
> ceph@lab-n1:~$ ceph -s
>     cluster 68bc69c1-1382-4c30-9bf8-480e32cc5b92
>      health HEALTH_WARN 2 pgs stuck unclean; nodeep-scrub flag(s) set
>      monmap e1: 3 mons at
> {lab-n1=10.0.50.211:6789/0,lab-n2=10.0.50.212:6789/0,nc48-n3=10.0.50.213:6789/0},
> election epoch 236, quorum 0,1,2 lab-n1,lab-n2,lab-n3
>      osdmap e49905: 94 osds: 94 up, 94 in
>             flags nodeep-scrub
>       pgmap v1523516: 6144 pgs, 2 pools, 32949 GB data, 4130 kobjects
>             85133 GB used, 258 TB / 341 TB avail
>                 6142 active+clean
>                    2 active+remapped
>
>
> ceph@nc48-n1:~$ ceph osd tree
> # id    weight    type name    up/down    reweight
> -1    320.3    root default
> -2    40.04        host lab-n1
> 0    3.64            osd.0    up    1
> 6    3.64            osd.6    up    1
> 12    3.64            osd.12    up    1
> 18    3.64            osd.18    up    1
> 24    3.64            osd.24    up    1
> 30    3.64            osd.30    up    1
> 36    3.64            osd.36    up    1
> 42    3.64            osd.42    up    1
> 48    3.64            osd.48    up    1
> 54    3.64            osd.54    up    1
> 60    3.64            osd.60    up    1
> -3    40.04        host lab-n2
> 1    3.64            osd.1    up    1
> 7    3.64            osd.7    up    1
> 13    3.64            osd.13    up    1
> 19    3.64            osd.19    up    1
> 25    3.64            osd.25    up    1
> 31    3.64            osd.31    up    1
> 37    3.64            osd.37    up    1
> 43    3.64            osd.43    up    1
> 49    3.64            osd.49    up    1
> 55    3.64            osd.55    up    1
> 61    3.64            osd.61    up    1
> -4    40.04        host lab-n3
> 2    3.64            osd.2    up    1
> 8    3.64            osd.8    up    1
> 14    3.64            osd.14    up    1
> 20    3.64            osd.20    up    1
> 26    3.64            osd.26    up    1
> 32    3.64            osd.32    up    1
> 38    3.64            osd.38    up    1
> 44    3.64            osd.44    up    1
> 50    3.64            osd.50    up    1
> 56    3.64            osd.56    up    1
> 62    3.64            osd.62    up    1
> -5    40.04        host lab-n4
> 3    3.64            osd.3    up    1
> 9    3.64            osd.9    up    1
> 15    3.64            osd.15    up    1
> 21    3.64            osd.21    up    1
> 27    3.64            osd.27    up    1
> 33    3.64            osd.33    up    1
> 39    3.64            osd.39    up    1
> 45    3.64            osd.45    up    1
> 51    3.64            osd.51    up    1
> 57    3.64            osd.57    up    1
> 63    3.64            osd.63    up    1
> -6    40.04        host lab-n5
> 4    3.64            osd.4    up    1
> 10    3.64            osd.10    up    1
> 16    3.64            osd.16    up    1
> 22    3.64            osd.22    up    1
> 28    3.64            osd.28    up    1
> 34    3.64            osd.34    up    1
> 40    3.64            osd.40    up    1
> 46    3.64            osd.46    up    1
> 52    3.64            osd.52    up    1
> 58    3.64            osd.58    up    1
> 64    3.64            osd.64    up    1
> -7    40.04        host lab-n6
> 5    3.64            osd.5    up    1
> 11    3.64            osd.11    up    1
> 17    3.64            osd.17    up    1
> 23    3.64            osd.23    up    1
> 29    3.64            osd.29    up    1
> 35    3.64            osd.35    up    1
> 41    3.64            osd.41    up    1
> 47    3.64            osd.47    up    1
> 53    3.64            osd.53    up    1
> 59    3.64            osd.59    up    1
> 65    3.64            osd.65    up    1
> -15    40.04        host lab-n7
> 72    3.64            osd.72    up    1
> 74    3.64            osd.74    up    1
> 76    3.64            osd.76    up    1
> 78    3.64            osd.78    up    1
> 80    3.64            osd.80    up    1
> 82    3.64            osd.82    up    1
> 84    3.64            osd.84    up    1
> 86    3.64            osd.86    up    1
> 88    3.64            osd.88    up    1
> 90    3.64            osd.90    up    1
> 92    3.64            osd.92    up    1
> -16    40.04        host lab-n8
> 73    3.64            osd.73    up    1
> 75    3.64            osd.75    up    1
> 77    3.64            osd.77    up    1
> 79    3.64            osd.79    up    1
> 81    3.64            osd.81    up    1
> 83    3.64            osd.83    up    1
> 85    3.64            osd.85    up    1
> 87    3.64            osd.87    up    1
> 89    3.64            osd.89    up    1
> 91    3.64            osd.91    up    1
> 93    3.64            osd.93    up    1
>
>
>
>
>
>
>
> _______________________________________________
> Ceph-community mailing list
> Ceph-community@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com