This cluster has a long unhealthy story, means this issue is not happening out of the blue. root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_WARN 1 MDSs report slow metadata IOs noscrub,nodeep-scrub flag(s) set Reduced data availability: 1 pg inactive, 1 pg down 1 subtrees have overcommitted pool target_size_bytes 1 subtrees have overcommitted pool target_size_ratio 18 slow requests are blocked > 32 sec mons ld5505,ld5506 are low on available space services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2h) mgr: ld5507(active, since 28h), standbys: ld5506, ld5505 mds: cephfs:1 {0=ld4465=up:active} 1 up:standby osd: 441 osds: 438 up, 438 in flags noscrub,nodeep-scrub data: pools: 6 pools, 8432 pgs objects: 63.28M objects, 241 TiB usage: 723 TiB used, 796 TiB / 1.5 PiB avail pgs: 0.012% pgs not active 8431 active+clean 1 creating+down io: client: 33 MiB/s rd, 14.20k op/s rd, 0 op/s wr Am 15.11.2019 um 13:24 schrieb Wido den Hollander: > > On 11/15/19 11:22 AM, Thomas Schneider wrote: >> Hi, >> ceph health is reporting: pg 59.1c is creating+down, acting [426,438] >> >> root@ld3955:~# ceph health detail >> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub >> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1 >> subtrees have overcommitted pool target_size_bytes; 1 subtrees have >> overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on >> available space >> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs >> mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest >> blocked for 120721 secs >> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set >> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down >> pg 59.1c is creating+down, acting [426,438] >> MON_DISK_LOW mons ld5505,ld5506 are low on available space >> mon.ld5505 has 22% avail >> mon.ld5506 has 29% avail >> >> root@ld3955:~# ceph pg dump_stuck inactive >> ok >> PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY >> 59.1c creating+down [426,438] 426 [426,438] 426 >> >> How can I fix this? > Did you change anything to the cluster? > > Can you share this output: > > $ ceph status > > As there seems that more things are wrong with this system. This doesn't > happen out of the blue. Something must have happened. > > Wido > >> THX >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx