Hello all,
We have an issue with our ceph cluster where 'ceph -s' shows that
several requests are blocked, however querying further with 'ceph health
detail' indicates that the PGs affected are either active+clean or do
not currently exist.
OSD 32 appears to be working fine, and the cluster is performing as
expected with no clients seemingly affected.
Note - we had just upgraded to Luminous - and despite having "mon max pg
per osd = 400" set in ceph.conf, we still have the message "too many PGs
per OSD (278 > max 200)"
In order to improve the situation above, I removed several pools that
were not used anymore. I assume the PGs that ceph cannot find now are
related to this pool deletion.
Does anyone have any ideas on how to get out of this state?
Details below - and full 'ceph health detail' attached to this email.
Kind regards,
Ben Morrice
[root@ceph03 ~]# ceph -s
cluster:
id: 6c21c4ba-9c4d-46ef-93a3-441b8055cdc6
health: HEALTH_WARN
Degraded data redundancy: 443765/14311983 objects degraded
(3.101%), 162 pgs degraded, 241 pgs undersized
75 slow requests are blocked > 32 sec. Implicated osds 32
too many PGs per OSD (278 > max 200)
services:
mon: 5 daemons, quorum bbpocn01,bbpocn02,bbpocn03,bbpocn04,bbpocn07
mgr: bbpocn03(active, starting)
osd: 36 osds: 36 up, 36 in
rgw: 1 daemon active
data:
pools: 24 pools, 3440 pgs
objects: 4.77M objects, 7.69TiB
usage: 23.1TiB used, 104TiB / 127TiB avail
pgs: 443765/14311983 objects degraded (3.101%)
3107 active+clean
170 active+undersized
109 active+undersized+degraded
43 active+recovery_wait+degraded
10 active+recovering+degraded
1 active+recovery_wait
[root@ceph03 ~]# for i in `ceph health detail |grep stuck | awk '{print
$2}'`; do echo -n "$i: " ; ceph pg $i query -f plain | cut -d: -f2 | cut
-d\" -f2; done
150.270: active+clean
150.2a0: active+clean
150.2b6: active+clean
150.2c2: active+clean
150.2cc: active+clean
150.2d5: active+clean
150.2d6: active+clean
150.2e1: active+clean
150.2ef: active+clean
150.2f5: active+clean
150.2f7: active+clean
150.2fc: active+clean
150.315: active+clean
150.318: active+clean
150.31a: active+clean
150.320: active+clean
150.326: active+clean
150.36e: active+clean
150.380: active+clean
150.389: active+clean
150.3a4: active+clean
150.3ad: active+clean
150.3b4: active+clean
150.3bb: active+clean
150.3ce: active+clean
150.3d0: active+clean
150.3d8: active+clean
150.3e0: active+clean
150.3f6: active+clean
165.24c: Error ENOENT: problem getting command descriptions from pg.165.24c
165.28f: Error ENOENT: problem getting command descriptions from pg.165.28f
165.2b3: Error ENOENT: problem getting command descriptions from pg.165.2b3
165.2b4: Error ENOENT: problem getting command descriptions from pg.165.2b4
165.2d6: Error ENOENT: problem getting command descriptions from pg.165.2d6
165.2f4: Error ENOENT: problem getting command descriptions from pg.165.2f4
165.2fd: Error ENOENT: problem getting command descriptions from pg.165.2fd
165.30f: Error ENOENT: problem getting command descriptions from pg.165.30f
165.322: Error ENOENT: problem getting command descriptions from pg.165.322
165.325: Error ENOENT: problem getting command descriptions from pg.165.325
165.334: Error ENOENT: problem getting command descriptions from pg.165.334
165.36e: Error ENOENT: problem getting command descriptions from pg.165.36e
165.37c: Error ENOENT: problem getting command descriptions from pg.165.37c
165.382: Error ENOENT: problem getting command descriptions from pg.165.382
165.387: Error ENOENT: problem getting command descriptions from pg.165.387
165.3af: Error ENOENT: problem getting command descriptions from pg.165.3af
165.3da: Error ENOENT: problem getting command descriptions from pg.165.3da
165.3e0: Error ENOENT: problem getting command descriptions from pg.165.3e0
165.3e2: Error ENOENT: problem getting command descriptions from pg.165.3e2
165.3e9: Error ENOENT: problem getting command descriptions from pg.165.3e9
165.3fb: Error ENOENT: problem getting command descriptions from pg.165.3fb
[root@ceph03 ~]# ceph pg 165.24c query
Error ENOENT: problem getting command descriptions from pg.165.24c
[root@ceph03 ~]# ceph pg 165.24c delete
Error ENOENT: problem getting command descriptions from pg.165.24c
--
Kind regards,
Ben Morrice
______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland
HEALTH_WARN Degraded data redundancy: 443765/14311983 objects degraded (3.101%), 162 pgs degraded, 241 pgs undersized; 75 slow requests are blocked > 32 sec. Implicated osds 32; too many PGs per OSD (278 > max 200)
pg 150.270 is stuck undersized for 1871.987162, current state active+undersized, last acting [17,30]
pg 150.2a0 is stuck undersized for 1871.988539, current state active+undersized, last acting [16,24]
pg 150.2b6 is stuck undersized for 1871.984670, current state active+undersized, last acting [26,28]
pg 150.2c2 is stuck undersized for 1871.985571, current state active+undersized, last acting [10,30]
pg 150.2cc is stuck undersized for 1871.991733, current state active+undersized, last acting [35,23]
pg 150.2d5 is stuck undersized for 1871.992692, current state active+undersized, last acting [15,24]
pg 150.2d6 is stuck undersized for 1871.985410, current state active+undersized, last acting [23,34]
pg 150.2e1 is stuck undersized for 1871.990823, current state active+undersized, last acting [35,13]
pg 150.2ef is stuck undersized for 1871.990259, current state active+undersized, last acting [25,33]
pg 150.2f5 is stuck undersized for 1871.988578, current state active+undersized, last acting [35,11]
pg 150.2f7 is stuck undersized for 1871.989826, current state active+undersized, last acting [19,12]
pg 150.2fc is stuck undersized for 1871.987132, current state active+undersized, last acting [13,25]
pg 150.315 is stuck undersized for 1871.988419, current state active+undersized, last acting [24,12]
pg 150.318 is stuck undersized for 1871.985784, current state active+undersized, last acting [28,23]
pg 150.31a is stuck undersized for 1871.988659, current state active+undersized, last acting [23,30]
pg 150.320 is stuck undersized for 1871.986622, current state active+undersized, last acting [29,24]
pg 150.326 is stuck undersized for 1871.989506, current state active+undersized, last acting [29,10]
pg 150.36e is stuck undersized for 1871.991475, current state active+undersized, last acting [12,20]
pg 150.380 is stuck undersized for 1871.990961, current state active+undersized+degraded, last acting [23,13]
pg 150.389 is stuck undersized for 1871.984920, current state active+undersized+degraded, last acting [26,12]
pg 150.3a4 is stuck undersized for 1871.992132, current state active+undersized, last acting [22,34]
pg 150.3ad is stuck undersized for 1871.991914, current state active+undersized, last acting [15,33]
pg 150.3b4 is stuck undersized for 1871.986881, current state active+undersized, last acting [28,19]
pg 150.3bb is stuck undersized for 1871.987502, current state active+undersized, last acting [19,12]
pg 150.3ce is stuck undersized for 1871.989547, current state active+undersized, last acting [24,9]
pg 150.3d0 is stuck undersized for 1871.988650, current state active+undersized, last acting [15,18]
pg 150.3d8 is stuck undersized for 1871.985067, current state active+undersized, last acting [20,16]
pg 150.3e0 is stuck undersized for 1871.986621, current state active+undersized, last acting [23,10]
pg 150.3f6 is stuck undersized for 1871.986451, current state active+undersized, last acting [13,18]
pg 165.24c is stuck undersized for 1871.989838, current state active+undersized, last acting [31,13]
pg 165.28f is stuck undersized for 1871.987943, current state active+undersized, last acting [28,9]
pg 165.2b3 is stuck undersized for 1871.986314, current state active+undersized, last acting [32,11]
pg 165.2b4 is stuck undersized for 1871.990227, current state active+undersized, last acting [30,12]
pg 165.2d6 is stuck undersized for 1871.987215, current state active+undersized, last acting [32,25]
pg 165.2f4 is stuck undersized for 1871.992309, current state active+undersized, last acting [27,20]
pg 165.2fd is stuck undersized for 1871.992173, current state active+undersized, last acting [18,15]
pg 165.30f is stuck undersized for 1871.988641, current state active+undersized, last acting [24,10]
pg 165.322 is stuck undersized for 1871.992408, current state active+undersized, last acting [20,33]
pg 165.325 is stuck undersized for 1871.991148, current state active+undersized, last acting [24,28]
pg 165.334 is stuck undersized for 1871.989945, current state active+undersized, last acting [34,25]
pg 165.33e is active+undersized+degraded, acting [21,10]
pg 165.36e is stuck undersized for 1871.991843, current state active+undersized, last acting [24,12]
pg 165.37c is stuck undersized for 1871.989140, current state active+undersized, last acting [31,23]
pg 165.382 is stuck undersized for 1871.991045, current state active+undersized, last acting [10,20]
pg 165.387 is stuck undersized for 1871.987867, current state active+undersized, last acting [30,12]
pg 165.3af is stuck undersized for 1871.987671, current state active+undersized, last acting [22,34]
pg 165.3da is stuck undersized for 1871.992028, current state active+undersized, last acting [20,9]
pg 165.3e0 is stuck undersized for 1871.990471, current state active+undersized, last acting [24,13]
pg 165.3e2 is stuck undersized for 1871.990954, current state active+undersized, last acting [28,9]
pg 165.3e9 is stuck undersized for 1871.991001, current state active+undersized, last acting [24,13]
pg 165.3fb is stuck undersized for 1871.992232, current state active+undersized, last acting [25,9]
22 ops are blocked > 2097.15 sec
24 ops are blocked > 1048.58 sec
5 ops are blocked > 524.288 sec
1 ops are blocked > 262.144 sec
22 ops are blocked > 131.072 sec
1 ops are blocked > 32.768 sec
osd.32 has blocked requests > 2097.15 sec
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com