Hi, On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
Hi! Today I added some new OSDs (nearly doubled) to my luminous cluster. I then changed pg(p)_num from 256 to 1024 for that pool because it was complaining about to few PGs. (I noticed that should better have been small changes). This is the current status: health: HEALTH_ERR 336568/1307562 objects misplaced (25.740%) Reduced data availability: 128 pgs inactive, 3 pgs peering, 1 pg stale Degraded data redundancy: 6985/1307562 objects degraded (0.534%), 19 pgs degraded, 19 pgs undersized 107 slow requests are blocked > 32 sec 218 stuck requests are blocked > 4096 sec data: pools: 2 pools, 1536 pgs objects: 638k objects, 2549 GB usage: 5210 GB used, 11295 GB / 16506 GB avail pgs: 0.195% pgs unknown 8.138% pgs not active 6985/1307562 objects degraded (0.534%) 336568/1307562 objects misplaced (25.740%) 855 active+clean 517 active+remapped+backfill_wait 107 activating+remapped 31 active+remapped+backfilling 15 activating+undersized+degraded+remapped 4 active+undersized+degraded+remapped+backfilling 3 unknown 3 peering 1 stale+active+clean
You need to resolve the unknown/peering/activating pgs first. You have 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25 OSDs and the heterogenous host sizes, I assume that some OSDs hold more than 200 PGs. There's a threshold for the number of PGs; reaching this threshold keeps the OSDs from accepting new PGs.
Try to increase the threshold (mon_max_pg_per_osd / max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about the exact one, consult the documentation) to allow more PGs on the OSDs. If this is the cause of the problem, the peering and activating states should be resolved within a short time.
You can also check the number of PGs per OSD with 'ceph osd df'; the last column is the current number of PGs.
OSD tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 16.12177 root default -16 16.12177 datacenter dc01 -19 16.12177 pod dc01-agg01 -10 8.98700 rack dc01-rack02 -4 4.03899 host node1001 0 hdd 0.90999 osd.0 up 1.00000 1.00000 1 hdd 0.90999 osd.1 up 1.00000 1.00000 5 hdd 0.90999 osd.5 up 1.00000 1.00000 2 ssd 0.43700 osd.2 up 1.00000 1.00000 3 ssd 0.43700 osd.3 up 1.00000 1.00000 4 ssd 0.43700 osd.4 up 1.00000 1.00000 -7 4.94899 host node1002 9 hdd 0.90999 osd.9 up 1.00000 1.00000 10 hdd 0.90999 osd.10 up 1.00000 1.00000 11 hdd 0.90999 osd.11 up 1.00000 1.00000 12 hdd 0.90999 osd.12 up 1.00000 1.00000 6 ssd 0.43700 osd.6 up 1.00000 1.00000 7 ssd 0.43700 osd.7 up 1.00000 1.00000 8 ssd 0.43700 osd.8 up 1.00000 1.00000 -11 7.13477 rack dc01-rack03 -22 5.38678 host node1003 17 hdd 0.90970 osd.17 up 1.00000 1.00000 18 hdd 0.90970 osd.18 up 1.00000 1.00000 24 hdd 0.90970 osd.24 up 1.00000 1.00000 26 hdd 0.90970 osd.26 up 1.00000 1.00000 13 ssd 0.43700 osd.13 up 1.00000 1.00000 14 ssd 0.43700 osd.14 up 1.00000 1.00000 15 ssd 0.43700 osd.15 up 1.00000 1.00000 16 ssd 0.43700 osd.16 up 1.00000 1.00000 -25 1.74799 host node1004 19 ssd 0.43700 osd.19 up 1.00000 1.00000 20 ssd 0.43700 osd.20 up 1.00000 1.00000 21 ssd 0.43700 osd.21 up 1.00000 1.00000 22 ssd 0.43700 osd.22 up 1.00000 1.00000 Crush rule is set to chooseleaf rack and (temporary!) to size 2. Why are PGs stuck in peering and activating? "ceph df" shows that only 1,5TB are used on the pool, residing on the hdd's - which would perfectly fit the crush rule....(?)
Size 2 within the crush rule or size 2 for the two pools? Regards, Burkhard _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com