Re: 5 pgs of 712 stuck in active+remapped

Micha Krause <micha@xxxxxxxxxx> · Fri, 8 Jul 2016 11:40:39 +0200

Hi,

As far as I know, this is exactly the problem why the new tunables where introduced, If you use 3 Replicas with only 3 hosts, crush sometimes doesn't find a solution to place all pgs.

If you are really stuck with bobtail turntables, I can think of 2 possible workarounds:

1. Add another osd Server.
2. Bad idea, but could work: build your crush rule manually, e.g.: set all primary pgs to host ceph1, first copy to host ceph2 and second copy to host3.

Micha Krause

Am 08.07.2016 um 05:47 schrieb Nathanial Byrnes:
Hello,
     I've got a Jewel Cluster (3 nodes, 15 OSD's) running with bobtail tunables (my xenserver cluster uses 3.10 as the kernel and there's no upgrading that....). I started the cluster out on Hammer, upgraded to Jewel, discovered that optimal tunables would not work, and then set the tunables back to bobtail. Once the re-balancing completed, I was stuck with 1 pg in active+remapped. Repair didn't fix the pg.  I then upped the number of pgs from 328 to 712 (oddly I asked for 512, but ended p with 712...), now I have 5 pgs stuck in active+remapped. I also tried re-weighting the pgs a couple times, but no change.... Here is my osd tree:

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 15.00000 root default
-2  5.00000     host ceph1
  0  1.00000         osd.0       up  0.95001          1.00000
  1  1.00000         osd.1       up  1.00000          1.00000
  2  1.00000         osd.2       up  1.00000          1.00000
  3  1.00000         osd.3       up  0.90002          1.00000
  4  1.00000         osd.4       up  1.00000          1.00000
-3  5.00000     host ceph3
10  1.00000         osd.10      up  1.00000          1.00000
11  1.00000         osd.11      up  1.00000          1.00000
12  1.00000         osd.12      up  1.00000          1.00000
13  1.00000         osd.13      up  1.00000          1.00000
14  1.00000         osd.14      up  1.00000          1.00000
-4  5.00000     host ceph2
  5  1.00000         osd.5       up  1.00000          1.00000
  6  1.00000         osd.6       up  1.00000          1.00000
  7  1.00000         osd.7       up  1.00000          1.00000
  8  1.00000         osd.8       up  1.00000          1.00000
  9  1.00000         osd.9       up  1.00000          1.00000

     Any suggestions on how to troubleshoot or repair this?

     Thanks and Regards,
     Nate

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com