Can you please advise how to fix this (manually)? My cluster is not getting healthy since 14 days now. Am 24.09.2019 um 13:35 schrieb Burkhard Linke: > Hi, > > > you need to fix the non active PGs first. They are also probably the > reason for the blocked requests. > > > Regards, > > Burkhard > > > On 9/24/19 1:30 PM, Thomas wrote: >> Hi, >> ceph health reports >> 1 MDSs report slow metadata IOs >> 1 MDSs report slow requests >> >> This is the complete output of ceph -s: >> root@ld3955:~# ceph -s >> cluster: >> id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae >> health: HEALTH_ERR >> 1 MDSs report slow metadata IOs >> 1 MDSs report slow requests >> 72 nearfull osd(s) >> 1 pool(s) nearfull >> Reduced data availability: 33 pgs inactive, 32 pgs peering >> Degraded data redundancy: 123285/153918525 objects degraded >> (0.080%), 27 pgs degraded, 27 pgs undersized >> Degraded data redundancy (low space): 116 pgs >> backfill_toofull >> 3 pools have too many placement groups >> 54 slow requests are blocked > 32 sec >> 179 stuck requests are blocked > 4096 sec >> >> services: >> mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 21h) >> mgr: ld5507(active, since 21h), standbys: ld5506, ld5505 >> mds: pve_cephfs:1 {0=ld3955=up:active} 1 up:standby >> osd: 368 osds: 368 up, 368 in; 140 remapped pgs >> >> data: >> pools: 6 pools, 8872 pgs >> objects: 51.31M objects, 196 TiB >> usage: 591 TiB used, 561 TiB / 1.1 PiB avail >> pgs: 0.372% pgs not active >> 123285/153918525 objects degraded (0.080%) >> 621911/153918525 objects misplaced (0.404%) >> 8714 active+clean >> 90 active+remapped+backfill_toofull >> 26 active+undersized+degraded+remapped+backfill_toofull >> 16 peering >> 16 remapped+peering >> 7 active+remapped+backfill_wait >> 1 activating >> 1 active+recovery_wait+degraded >> 1 active+recovery_wait+undersized+remapped >> >> In the log I find these relevant entries: >> 2019-09-24 13:24:37.073695 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18618.873983 secs >> 2019-09-24 13:24:42.073757 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18623.874055 secs >> 2019-09-24 13:24:47.073852 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18628.874149 secs >> 2019-09-24 13:24:52.073941 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18633.874237 secs >> 2019-09-24 13:24:57.074073 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18638.874354 secs >> 2019-09-24 13:25:02.074118 mds.ld3955 [WRN] 2 slow requests, 0 included >> below; oldest blocked for > 18643.874415 secs >> >> Cephfs is residing on a pool "hdd" with dedicated HDDs (4x 17 1.6TB). >> This pool is used for RBDs, too. >> >> Question: >> How can I identify the 2 slow requests? >> And how can I kill these requests? >> >> Regards >> Thomas >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx