On Wed, Mar 25, 2015 at 1:20 AM, Udo Lembke <ulembke@xxxxxxxxxxxx> wrote: > Hi, > due to two more hosts (now 7 storage nodes) I want to create an new > ec-pool and get an strange effect: > > ceph@admin:~$ ceph health detail > HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 > pgs stuck undersized; 2 pgs undersized This is the big clue: you have two undersized PGs! > pg 22.3e5 is stuck unclean since forever, current state > active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] 2147483647 is the largest number you can represent in a signed 32-bit integer. There's an output error of some kind which is fixed elsewhere; this should be "-1". So for whatever reason (in general it's hard on CRUSH trying to select N entries out of N choices), CRUSH hasn't been able to map an OSD to this slot for you. You'll want to figure out why that is and fix it. -Greg > pg 22.240 is stuck unclean since forever, current state > active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] > pg 22.3e5 is stuck undersized for 406.614447, current state > active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] > pg 22.240 is stuck undersized for 406.616563, current state > active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] > pg 22.3e5 is stuck degraded for 406.614566, current state > active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] > pg 22.240 is stuck degraded for 406.616679, current state > active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] > pg 22.3e5 is active+undersized+degraded, acting > [76,15,82,11,57,29,2147483647] > pg 22.240 is active+undersized+degraded, acting > [38,85,17,74,2147483647,10,58] > > But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647! > Where the heck came the 2147483647 from? > > I do following commands: > ceph osd erasure-code-profile set 7hostprofile k=5 m=2 > ruleset-failure-domain=host > ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile > > my version: > ceph -v > ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) > > > I found an issue in my crush-map - one SSD was twice in the map: > host ceph-061-ssd { > id -16 # do not change unnecessarily > # weight 0.000 > alg straw > hash 0 # rjenkins1 > } > root ssd { > id -13 # do not change unnecessarily > # weight 0.780 > alg straw > hash 0 # rjenkins1 > item ceph-01-ssd weight 0.170 > item ceph-02-ssd weight 0.170 > item ceph-03-ssd weight 0.000 > item ceph-04-ssd weight 0.170 > item ceph-05-ssd weight 0.170 > item ceph-06-ssd weight 0.050 > item ceph-07-ssd weight 0.050 > item ceph-061-ssd weight 0.000 > } > > Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, > but after fix the crusmap the issue with the osd 2147483647 still excist. > > Any idea how to fix that? > > regards > > Udo > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com