As per a previous thread, my pgs are set too high. I tried adjusting the “mon max pg per osd” up higher and higher, which did clear the error(restarted monitors and managers each time), but it seems that data simply wont move around the cluster. If I stop the primary OSD of an incomplete pg, the cluster just shows those affected pages as active+undersized+degraded: services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon3(active), standbys: mon1, mon2 osd: 43 osds: 43 up, 43 in data: pools: 11 pools, 36896 pgs objects: 8148k objects, 10486 GB usage: 21532 GB used, 135 TB / 156 TB avail pgs: 0.043% pgs unknown 0.011% pgs not active 362942/16689272 objects degraded (2.175%) 34483 active+clean 2393 active+undersized+degraded 16 unknown 3 incomplete 1 down The 16 unknown are from me trying to setup a new pool, which was successful, but when I tried to copy an existing pool to it, the command just sat there. I did this in the hopes of copying the existing oversized pg pools to new pools and then deleting the old pools. I really didn’t want to move the data, but the issue needs to be dealt with. If I start the OSD back up, the cluster goes back to: services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon3(active), standbys: mon1, mon2 osd: 43 osds: 43 up, 43 in data: pools: 11 pools, 36896 pgs objects: 8148k objects, 10486 GB usage: 21533 GB used, 135 TB / 156 TB avail pgs: 0.041% pgs unknown 0.014% pgs not active 36876 active+clean 16 unknown 4 incomplete The cluster was upgraded from Hammer .94 without issues to Jewel and then Luminous 12.2.2 last week using the latest ceph-deploy. I guess the issue at the moment is that data is not moving either for recovery or new data being added( basically the new data just times out ). I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t seem to trigger any data moved. I did restart the OSDs each time I changed it. The data just wont finish moving. “ceph –w” shows this: 2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 seconds old, received at 2018-01-10 07:33:27.039907: osd_op(client.3542508.0:4097 14.0 14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984) currently queued_for_pg Ceph health detail this: HEALTH_ERR Reduced data availability: 20 pgs inactive, 4 pgs incomplete; Degraded data redundancy: 20 pgs unclean; 2 slow requests are blocked > 32 sec; 66 stuck requests are blocked > 4096 sec PG_AVAILABILITY Reduced data availability: 20 pgs inactive, 4 pgs incomplete pg 11.720 is incomplete, acting [21,10] pg 11.9ab is incomplete, acting [14,2] pg 11.9fb is incomplete, acting [32,43] pg 11.c13 is incomplete, acting [42,26] pg 14.0 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.1 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.2 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.3 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.4 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.5 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.6 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.7 is stuck inactive for 1046.844458, current state creating+activating, last acting [21,40,5] pg 14.8 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.9 is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.a is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.b is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.c is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.d is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.e is stuck inactive for 1046.844458, current state unknown, last acting [] pg 14.f is stuck inactive for 1046.844458, current state unknown, last acting [] PG_DEGRADED Degraded data redundancy: 20 pgs unclean pg 11.720 is stuck unclean since forever, current state incomplete, last acting [21,10] pg 11.9ab is stuck unclean since forever, current state incomplete, last acting [14,2] pg 11.9fb is stuck unclean since forever, current state incomplete, last acting [32,43] pg 11.c13 is stuck unclean since forever, current state incomplete, last acting [42,26] pg 14.0 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.1 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.2 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.3 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.4 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.5 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.6 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.7 is stuck unclean for 1046.844458, current state creating+activating, last acting [21,40,5] pg 14.8 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.9 is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.a is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.b is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.c is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.d is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.e is stuck unclean for 1046.844458, current state unknown, last acting [] pg 14.f is stuck unclean for 1046.844458, current state unknown, last acting [] REQUEST_SLOW 2 slow requests are blocked > 32 sec 2 ops are blocked > 1048.58 sec osds 15,20 have blocked requests > 1048.58 sec REQUEST_STUCK 66 stuck requests are blocked > 4096 sec 66 ops are blocked > 4194.3 sec osds 14,32,42 have stuck requests > 4194.3 sec Any help would be appreciated, right now you can read the data, but that’s about it, the cluster is not writable. -Brent |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com