moving this to ceph-user where it needs to be for eyeballs and responses. :) On Mon, Apr 6, 2015 at 1:34 AM, Paul Evans <paul@xxxxxxxxxxxx> wrote: > Hello Ceph Community & thanks to anyone with advice on this interesting > situation... > ======================================================== > The Problem: we have 2 pgs out of 6144 that are still stuck in an > active+remapped state, and we would like to know if there is a targeted & > specific way to fix this issue (other than just forcing data to re-sort in > the cluster in a generic re-shuffle). > > Background: We initially created a cluster of 6 ceph nodes with an EC pool & > profile where k=6 and m=2, but missed the configuration item > "ruleset-failure-domain=host" in the EC_profile (thinking it defaulted to > =osd). While the ceph cluster allowed us to create the pool and store data > in it, the fact that we had an EC data spread designed for 8 targets > (k+m=8), but only had 6 targets (our 6 nodes) eventually caught up to us > and we ended up with a number of pgs missing chunks of data. Fortunately, > the data remained 'relatively protected' because ceph remapped the missing > chunks to alternate hosts, but (of course) that left the pgs in an > active+remapped state and no way to solve the puzzle. > The fix? Easy enough: add two more nodes, which we did and *almost* all the > pgs re-distributed the data appropriately. Except for 4 pgs. Why 4? We're > not sure what was unique about those four, but we were able to reduce the > problem pgs to just 2 (as stated in our Problem section) by doing the > following: > > executed 'ceph pg repair xx.xxx, but nothing happens. After an hour... > executed 'ceph pg dump_stuck' and noted that 2 of the 4 pgs had a primary > OSD of 29. > executed 'ceph osd set noout' and 'sudo restart ceph-osd id=29 > observed that the osd restart caused a minor shuffle of data, and actually > left us with the same 4 pgs remapped PLUS 25 pgs stuck 'peering' (and btw: > not active). > After a couple of hours of waiting to see if the peering issues would > resolve (they didn't), we moved an 'extra' OSD out of the root holding the > EC pool, which kicked off a significant shuffle of data and ended up with > everything good again and only 2 pgs active+remapped. Which two? > Ironically, even though we were attempting to fix the two pgs that had OSD > 29 as their primary by way of our osd restart attempt,only one of them > repaired itself...leaving one pg still having osd.29 as it's primary. > > Where Things Stand Now: we have 2 pgs that are missing an appropriate > OSD, and are currently remapped. Here is the (shortened) output of the pg > queries: > > pg_stat objects state v > reported up > up_primary acting acting_primary > 10.28a 1488 active+remapped 8145'3499 49904:63527 > [64,73,0,32,3,59,2147483647,61] 64 [64,73,0,32,3,59,31,61] 64 > 10.439 1455 active+remapped 8145'3423 49904:62378 > [29,75,63,64,78,7,2147483647,60] 29 [29,75,63,64,78,7,8,60] 29 > > > Our question is relatively simple (we think): how does one get a pg that is > built using an EC_profile to fill in a missing OSD in it's 'up' definition? > Neither 'ceph pg repair' or 'ceph osd repair' resolved the situation for us, > and just randomly forcing re-shuffles of data seems haphazard at best. > > So...Does any one have a more targeted suggestion? If so - thanks! > > > Paul Evans > > Principal Architect - Daystrom Technology Group > > > ---------------------------- > > > Two more notes: > > 1) we have many of these 'fault' messages in the logs and don't know if they > are related in some way (172.16.x.x is the cluster back-end network): > > 2015-04-05 20:09:33.362107 7f6ac06ce700 0 -- 172.16.1.5:6839/8638 >> > 172.16.1.7:6810/2370 pipe(0x6749a580 sd=116 :6839 s=2 pgs=77 cs=1 l=0 > c=0x2f4851e0).fault with nothing to send, going to standby > > 2) here is the ceph osd tree and ceph -s output: > > ceph@lab-n1:~$ ceph -s > cluster 68bc69c1-1382-4c30-9bf8-480e32cc5b92 > health HEALTH_WARN 2 pgs stuck unclean; nodeep-scrub flag(s) set > monmap e1: 3 mons at > {lab-n1=10.0.50.211:6789/0,lab-n2=10.0.50.212:6789/0,nc48-n3=10.0.50.213:6789/0}, > election epoch 236, quorum 0,1,2 lab-n1,lab-n2,lab-n3 > osdmap e49905: 94 osds: 94 up, 94 in > flags nodeep-scrub > pgmap v1523516: 6144 pgs, 2 pools, 32949 GB data, 4130 kobjects > 85133 GB used, 258 TB / 341 TB avail > 6142 active+clean > 2 active+remapped > > > ceph@nc48-n1:~$ ceph osd tree > # id weight type name up/down reweight > -1 320.3 root default > -2 40.04 host lab-n1 > 0 3.64 osd.0 up 1 > 6 3.64 osd.6 up 1 > 12 3.64 osd.12 up 1 > 18 3.64 osd.18 up 1 > 24 3.64 osd.24 up 1 > 30 3.64 osd.30 up 1 > 36 3.64 osd.36 up 1 > 42 3.64 osd.42 up 1 > 48 3.64 osd.48 up 1 > 54 3.64 osd.54 up 1 > 60 3.64 osd.60 up 1 > -3 40.04 host lab-n2 > 1 3.64 osd.1 up 1 > 7 3.64 osd.7 up 1 > 13 3.64 osd.13 up 1 > 19 3.64 osd.19 up 1 > 25 3.64 osd.25 up 1 > 31 3.64 osd.31 up 1 > 37 3.64 osd.37 up 1 > 43 3.64 osd.43 up 1 > 49 3.64 osd.49 up 1 > 55 3.64 osd.55 up 1 > 61 3.64 osd.61 up 1 > -4 40.04 host lab-n3 > 2 3.64 osd.2 up 1 > 8 3.64 osd.8 up 1 > 14 3.64 osd.14 up 1 > 20 3.64 osd.20 up 1 > 26 3.64 osd.26 up 1 > 32 3.64 osd.32 up 1 > 38 3.64 osd.38 up 1 > 44 3.64 osd.44 up 1 > 50 3.64 osd.50 up 1 > 56 3.64 osd.56 up 1 > 62 3.64 osd.62 up 1 > -5 40.04 host lab-n4 > 3 3.64 osd.3 up 1 > 9 3.64 osd.9 up 1 > 15 3.64 osd.15 up 1 > 21 3.64 osd.21 up 1 > 27 3.64 osd.27 up 1 > 33 3.64 osd.33 up 1 > 39 3.64 osd.39 up 1 > 45 3.64 osd.45 up 1 > 51 3.64 osd.51 up 1 > 57 3.64 osd.57 up 1 > 63 3.64 osd.63 up 1 > -6 40.04 host lab-n5 > 4 3.64 osd.4 up 1 > 10 3.64 osd.10 up 1 > 16 3.64 osd.16 up 1 > 22 3.64 osd.22 up 1 > 28 3.64 osd.28 up 1 > 34 3.64 osd.34 up 1 > 40 3.64 osd.40 up 1 > 46 3.64 osd.46 up 1 > 52 3.64 osd.52 up 1 > 58 3.64 osd.58 up 1 > 64 3.64 osd.64 up 1 > -7 40.04 host lab-n6 > 5 3.64 osd.5 up 1 > 11 3.64 osd.11 up 1 > 17 3.64 osd.17 up 1 > 23 3.64 osd.23 up 1 > 29 3.64 osd.29 up 1 > 35 3.64 osd.35 up 1 > 41 3.64 osd.41 up 1 > 47 3.64 osd.47 up 1 > 53 3.64 osd.53 up 1 > 59 3.64 osd.59 up 1 > 65 3.64 osd.65 up 1 > -15 40.04 host lab-n7 > 72 3.64 osd.72 up 1 > 74 3.64 osd.74 up 1 > 76 3.64 osd.76 up 1 > 78 3.64 osd.78 up 1 > 80 3.64 osd.80 up 1 > 82 3.64 osd.82 up 1 > 84 3.64 osd.84 up 1 > 86 3.64 osd.86 up 1 > 88 3.64 osd.88 up 1 > 90 3.64 osd.90 up 1 > 92 3.64 osd.92 up 1 > -16 40.04 host lab-n8 > 73 3.64 osd.73 up 1 > 75 3.64 osd.75 up 1 > 77 3.64 osd.77 up 1 > 79 3.64 osd.79 up 1 > 81 3.64 osd.81 up 1 > 83 3.64 osd.83 up 1 > 85 3.64 osd.85 up 1 > 87 3.64 osd.87 up 1 > 89 3.64 osd.89 up 1 > 91 3.64 osd.91 up 1 > 93 3.64 osd.93 up 1 > > > > > > > > _______________________________________________ > Ceph-community mailing list > Ceph-community@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com > -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com