Check ceph pg query, it will (usually) tell you why something is stuck inactive.
Also: never do min_size 1.2018-05-17 15:48 GMT+02:00 Kevin Olbrich <ko@xxxxxxx>:
I was able to obtain another NVMe to get the HDDs in node1004 into the cluster.The number of disks (all 1TB) is now balanced between racks, still some inactive PGs:data:pools: 2 pools, 1536 pgsobjects: 639k objects, 2554 GBusage: 5167 GB used, 14133 GB / 19300 GB availpgs: 1.562% pgs not active1183/1309952 objects degraded (0.090%)199660/1309952 objects misplaced (15.242%)1072 active+clean405 active+remapped+backfill_wait35 active+remapped+backfilling21 activating+remapped3 activating+undersized+degraded+remapped ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-1 18.85289 root default-16 18.85289 datacenter dc01-19 18.85289 pod dc01-agg01-10 8.98700 rack dc01-rack02-4 4.03899 host node10010 hdd 0.90999 osd.0 up 1.00000 1.000001 hdd 0.90999 osd.1 up 1.00000 1.000005 hdd 0.90999 osd.5 up 1.00000 1.000002 ssd 0.43700 osd.2 up 1.00000 1.000003 ssd 0.43700 osd.3 up 1.00000 1.000004 ssd 0.43700 osd.4 up 1.00000 1.00000-7 4.94899 host node10029 hdd 0.90999 osd.9 up 1.00000 1.0000010 hdd 0.90999 osd.10 up 1.00000 1.0000011 hdd 0.90999 osd.11 up 1.00000 1.0000012 hdd 0.90999 osd.12 up 1.00000 1.000006 ssd 0.43700 osd.6 up 1.00000 1.000007 ssd 0.43700 osd.7 up 1.00000 1.000008 ssd 0.43700 osd.8 up 1.00000 1.00000-11 9.86589 rack dc01-rack03-22 5.38794 host node100317 hdd 0.90999 osd.17 up 1.00000 1.0000018 hdd 0.90999 osd.18 up 1.00000 1.0000024 hdd 0.90999 osd.24 up 1.00000 1.0000026 hdd 0.90999 osd.26 up 1.00000 1.0000013 ssd 0.43700 osd.13 up 1.00000 1.0000014 ssd 0.43700 osd.14 up 1.00000 1.0000015 ssd 0.43700 osd.15 up 1.00000 1.0000016 ssd 0.43700 osd.16 up 1.00000 1.00000-25 4.47795 host node100423 hdd 0.90999 osd.23 up 1.00000 1.0000025 hdd 0.90999 osd.25 up 1.00000 1.0000027 hdd 0.90999 osd.27 up 1.00000 1.0000019 ssd 0.43700 osd.19 up 1.00000 1.0000020 ssd 0.43700 osd.20 up 1.00000 1.0000021 ssd 0.43700 osd.21 up 1.00000 1.0000022 ssd 0.43700 osd.22 up 1.00000 1.00000Pools are size 2, min_size 1 during setup.The count of PGs in activate state are related to the weight of OSDs but why are they failing to proceed to active+clean or active+remapped?
Kind regards,
Kevin2018-05-17 14:05 GMT+02:00 Kevin Olbrich <ko@xxxxxxx>:Ok, I just waited some time but I still got some "activating" issues:data:pools: 2 pools, 1536 pgsobjects: 639k objects, 2554 GBusage: 5194 GB used, 11312 GB / 16506 GB availpgs: 7.943% pgs not active5567/1309948 objects degraded (0.425%)195386/1309948 objects misplaced (14.916%)1147 active+clean235 active+remapped+backfill_wait107 activating+remapped32 active+remapped+backfilling15 activating+undersized+degraded+remapped I set these settings during runtime:ceph tell 'osd.*' injectargs '--osd-max-backfills 16'ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32' Sure, mon_max_pg_per_osd is oversized but this is just temporary. Calculated PGs per OSD is 200.I searched the net and the bugtracker but most posts suggest osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I got more stuck PGs. Any more hints?Kind regards.Kevin2018-05-17 13:37 GMT+02:00 Kevin Olbrich <ko@xxxxxxx>:PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by default, will place 200 PGs on each OSD.I read about the protection in the docs and later noticed that I better had only placed 100 PGs.2018-05-17 13:35 GMT+02:00 Kevin Olbrich <ko@xxxxxxx>:Hi!Thanks for your quick reply.Before I read your mail, i applied the following conf to my OSDs:ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32' Status is now:data:pools: 2 pools, 1536 pgsobjects: 639k objects, 2554 GBusage: 5211 GB used, 11295 GB / 16506 GB availpgs: 7.943% pgs not active5567/1309948 objects degraded (0.425%)252327/1309948 objects misplaced (19.262%)1030 active+clean351 active+remapped+backfill_wait107 activating+remapped33 active+remapped+backfilling15 activating+undersized+degraded+remapped A little bit better but still some non-active PGs.I will investigate your other hints!ThanksKevin2018-05-17 13:30 GMT+02:00 Burkhard Linke <Burkhard.Linke@computational.bio.uni-giessen.de >:Hi,You need to resolve the unknown/peering/activating pgs first. You have 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25 OSDs and the heterogenous host sizes, I assume that some OSDs hold more than 200 PGs. There's a threshold for the number of PGs; reaching this threshold keeps the OSDs from accepting new PGs.
On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
Hi!
Today I added some new OSDs (nearly doubled) to my luminous cluster.
I then changed pg(p)_num from 256 to 1024 for that pool because it was
complaining about to few PGs. (I noticed that should better have been small
changes).
This is the current status:
health: HEALTH_ERR
336568/1307562 objects misplaced (25.740%)
Reduced data availability: 128 pgs inactive, 3 pgs peering, 1
pg stale
Degraded data redundancy: 6985/1307562 objects degraded
(0.534%), 19 pgs degraded, 19 pgs undersized
107 slow requests are blocked > 32 sec
218 stuck requests are blocked > 4096 sec
data:
pools: 2 pools, 1536 pgs
objects: 638k objects, 2549 GB
usage: 5210 GB used, 11295 GB / 16506 GB avail
pgs: 0.195% pgs unknown
8.138% pgs not active
6985/1307562 objects degraded (0.534%)
336568/1307562 objects misplaced (25.740%)
855 active+clean
517 active+remapped+backfill_wait
107 activating+remapped
31 active+remapped+backfilling
15 activating+undersized+degraded+remapped
4 active+undersized+degraded+remapped+backfilling
3 unknown
3 peering
1 stale+active+clean
Try to increase the threshold (mon_max_pg_per_osd / max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about the exact one, consult the documentation) to allow more PGs on the OSDs. If this is the cause of the problem, the peering and activating states should be resolved within a short time.
You can also check the number of PGs per OSD with 'ceph osd df'; the last column is the current number of PGs.Size 2 within the crush rule or size 2 for the two pools?
OSD tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 16.12177 root default
-16 16.12177 datacenter dc01
-19 16.12177 pod dc01-agg01
-10 8.98700 rack dc01-rack02
-4 4.03899 host node1001
0 hdd 0.90999 osd.0 up 1.00000 1.00000
1 hdd 0.90999 osd.1 up 1.00000 1.00000
5 hdd 0.90999 osd.5 up 1.00000 1.00000
2 ssd 0.43700 osd.2 up 1.00000 1.00000
3 ssd 0.43700 osd.3 up 1.00000 1.00000
4 ssd 0.43700 osd.4 up 1.00000 1.00000
-7 4.94899 host node1002
9 hdd 0.90999 osd.9 up 1.00000 1.00000
10 hdd 0.90999 osd.10 up 1.00000 1.00000
11 hdd 0.90999 osd.11 up 1.00000 1.00000
12 hdd 0.90999 osd.12 up 1.00000 1.00000
6 ssd 0.43700 osd.6 up 1.00000 1.00000
7 ssd 0.43700 osd.7 up 1.00000 1.00000
8 ssd 0.43700 osd.8 up 1.00000 1.00000
-11 7.13477 rack dc01-rack03
-22 5.38678 host node1003
17 hdd 0.90970 osd.17 up 1.00000 1.00000
18 hdd 0.90970 osd.18 up 1.00000 1.00000
24 hdd 0.90970 osd.24 up 1.00000 1.00000
26 hdd 0.90970 osd.26 up 1.00000 1.00000
13 ssd 0.43700 osd.13 up 1.00000 1.00000
14 ssd 0.43700 osd.14 up 1.00000 1.00000
15 ssd 0.43700 osd.15 up 1.00000 1.00000
16 ssd 0.43700 osd.16 up 1.00000 1.00000
-25 1.74799 host node1004
19 ssd 0.43700 osd.19 up 1.00000 1.00000
20 ssd 0.43700 osd.20 up 1.00000 1.00000
21 ssd 0.43700 osd.21 up 1.00000 1.00000
22 ssd 0.43700 osd.22 up 1.00000 1.00000
Crush rule is set to chooseleaf rack and (temporary!) to size 2.
Why are PGs stuck in peering and activating?
"ceph df" shows that only 1,5TB are used on the pool, residing on the hdd's
- which would perfectly fit the crush rule....(?)
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
--
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com