On 11/15/19 1:29 PM, Thomas Schneider wrote: > This cluster has a long unhealthy story, means this issue is not > happening out of the blue. > > root@ld3955:~# ceph -s > cluster: > id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae > health: HEALTH_WARN > 1 MDSs report slow metadata IOs > noscrub,nodeep-scrub flag(s) set > Reduced data availability: 1 pg inactive, 1 pg down > 1 subtrees have overcommitted pool target_size_bytes > 1 subtrees have overcommitted pool target_size_ratio > 18 slow requests are blocked > 32 sec > mons ld5505,ld5506 are low on available space > > services: > mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2h) > mgr: ld5507(active, since 28h), standbys: ld5506, ld5505 > mds: cephfs:1 {0=ld4465=up:active} 1 up:standby > osd: 441 osds: 438 up, 438 in I think this is the problem. You are lacking a few OSDs which are probably needed to get that PG back online. > flags noscrub,nodeep-scrub > > data: > pools: 6 pools, 8432 pgs > objects: 63.28M objects, 241 TiB > usage: 723 TiB used, 796 TiB / 1.5 PiB avail > pgs: 0.012% pgs not active > 8431 active+clean > 1 creating+down > > io: > client: 33 MiB/s rd, 14.20k op/s rd, 0 op/s wr > > > Am 15.11.2019 um 13:24 schrieb Wido den Hollander: >> >> On 11/15/19 11:22 AM, Thomas Schneider wrote: >>> Hi, >>> ceph health is reporting: pg 59.1c is creating+down, acting [426,438] >>> >>> root@ld3955:~# ceph health detail >>> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub >>> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1 >>> subtrees have overcommitted pool target_size_bytes; 1 subtrees have >>> overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on >>> available space >>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs >>> mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest >>> blocked for 120721 secs >>> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set >>> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down >>> pg 59.1c is creating+down, acting [426,438] >>> MON_DISK_LOW mons ld5505,ld5506 are low on available space >>> mon.ld5505 has 22% avail >>> mon.ld5506 has 29% avail >>> >>> root@ld3955:~# ceph pg dump_stuck inactive >>> ok >>> PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY >>> 59.1c creating+down [426,438] 426 [426,438] 426 >>> >>> How can I fix this? >> Did you change anything to the cluster? >> >> Can you share this output: >> >> $ ceph status >> >> As there seems that more things are wrong with this system. This doesn't >> happen out of the blue. Something must have happened. >> >> Wido >> >>> THX >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx