EC Pool Stuck w/ holes in PG Mapping

Billy Olsen <billy.olsen@xxxxxxxxxxxxx> · Tue, 1 Aug 2017 19:41:36 -0700

I'm dealing with a situation in which the placement groups in an EC
Pool is stuck. The EC Pool is configured as 6+2 (pool 15) with host
failure domain.

In this scenario, one of the nodes in the cluster was torn down and
recreated with the OSDs being marked as lost and then being rebuilt
from scratch. After the node had its OSDs rebuilt, there are 4 PGs
which are stuck, with the infamous NONE (2147483647) for 2 of the 8
acting OSDs. As it currently stands, there are 9 nodes containing 11
OSDs each. One of the nodes have all of the OSDs marked as out, and
one additional OSD (osd.93) is also marked as out of the cluster
(which is confusing, because for 15.7fb this OSD is the acting
primary).

The cluster was on firefly when all of this started, but was upgraded
to hammer (v0.94.10) with hopes that some improvements might come with
the crush tunables in hammer. The choose_tries for the ec ruleset
within the crushmap was increased to 100, and the crushtool testing
didn't show any issues with mapping the OSDs into placement groups.
The latest osdmap was extracted from one of the monitors and
osdmaptool --test-map-pg 15.7fb osdmap.bin shows the acting set as
also having holes in the pg mapping. Testing it with marking all OSDs
as up and in (--mark-up-in) resulted in the NONE holes as well. The
only thing that removed the holes was if the pg_temp was also removed
in the mapping.

Its not really clear how objects became unfound, as the loss of a
single node with this configuration shouldn't lose objects.

Any help and feedback is appreciated.

Here's the ceph health detail output, additional output of some states
are included in pastebins:

HEALTH_ERR 4 pgs degraded; 3 pgs inconsistent; 3 pgs recovering; 3 pgs
stuck degraded; 4 pgs stuck unclean; 3 pgs stuck undersized; 4 pgs
undersized; 100 requests are blocked > 32 sec; 1 osds have slow
requests; recovery 352034/722671544 objects degraded (0.049%);
recovery 1404947/722671544 objects misplaced (0.194%); recovery
192/90258551 unfound (0.000%); 7 scrub errors; too many PGs per OSD
(753 > max 300); noout flag(s) set
pg 15.7fb is stuck unclean for 1745750.479304, current state
active+recovering+undersized+degraded+remapped, last acting
[93,12,2147483647,7,39,80,75,2147483647]
pg 15.38a is stuck unclean for 1747276.253098, current state
active+undersized+degraded+remapped, last acting
[2147483647,95,39,80,29,8,73,2147483647]
pg 15.ee is stuck unclean for 1745723.882213, current state
active+recovering+undersized+degraded+remapped, last acting
[2147483647,20,93,80,2147483647,39,15,69]
pg 15.33c is stuck unclean for 1613257.331259, current state
active+recovering+undersized+degraded+remapped, last acting
[38,80,2147483647,2147483647,92,69,26,39]
pg 15.7fb is stuck undersized for 48918.444257, current state
active+recovering+undersized+degraded+remapped, last acting
[93,12,2147483647,7,39,80,75,2147483647]
pg 15.ee is stuck undersized for 48933.042271, current state
active+recovering+undersized+degraded+remapped, last acting
[2147483647,20,93,80,2147483647,39,15,69]
pg 15.33c is stuck undersized for 48990.546803, current state
active+recovering+undersized+degraded+remapped, last acting
[38,80,2147483647,2147483647,92,69,26,39]
pg 15.7fb is stuck degraded for 48918.445037, current state
active+recovering+undersized+degraded+remapped, last acting
[93,12,2147483647,7,39,80,75,2147483647]
pg 15.ee is stuck degraded for 48933.043052, current state
active+recovering+undersized+degraded+remapped, last acting
[2147483647,20,93,80,2147483647,39,15,69]
pg 15.33c is stuck degraded for 48990.547584, current state
active+recovering+undersized+degraded+remapped, last acting
[38,80,2147483647,2147483647,92,69,26,39]
pg 15.7fb is active+recovering+undersized+degraded+remapped, acting
[93,12,2147483647,7,39,80,75,2147483647], 88 unfound
pg 15.7dd is active+clean+inconsistent, acting [94,83,78,25,6,55,51,9]
pg 15.639 is active+clean+inconsistent, acting [50,10,77,95,57,80,23,29]
pg 15.38a is active+undersized+degraded+remapped, acting
[2147483647,95,39,80,29,8,73,2147483647], 27 unfound
pg 15.33c is active+recovering+undersized+degraded+remapped, acting
[38,80,2147483647,2147483647,92,69,26,39], 53 unfound
pg 15.2c0 is active+clean+inconsistent, acting [14,98,36,70,53,65,88,42]
pg 15.ee is active+recovering+undersized+degraded+remapped, acting
[2147483647,20,93,80,2147483647,39,15,69], 24 unfound
100 ops are blocked > 262.144 sec
100 ops are blocked > 262.144 sec on osd.95
1 osds have slow requests
recovery 352034/722671544 objects degraded (0.049%)
recovery 1404947/722671544 objects misplaced (0.194%)
recovery 192/90258551 unfound (0.000%)
7 scrub errors
too many PGs per OSD (753 > max 300)
noout flag(s) set

pg query for 15.7fb - http://paste.ubuntu.com/25223721/
pg query for 15.ee - http://paste.ubuntu.com/25223724/
pg query for 15.33c - http://paste.ubuntu.com/25223728/
pg query for 15.38a - http://paste.ubuntu.com/25223731/
osd tree, ceph -s, and ceph osd dump output are located -
http://paste.ubuntu.com/25223759/

OSD debug level was been cranked up to 30 for a few of the OSDs and
their logs were uploaded to a gzipped tarball at (large download, ~1.1
GB): http://people.canonical.com/~wolsen/ceph-stuck-ec-pool/ceph-osd-logs.2017-07-31.tgz

Thanks,

----
Billy Olsen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com