Re: PG in state: creating+down

Wido den Hollander <wido@xxxxxxxx> · Fri, 15 Nov 2019 15:59:54 +0100

On 11/15/19 1:29 PM, Thomas Schneider wrote:
> This cluster has a long unhealthy story, means this issue is not
> happening out of the blue.
> 
> root@ld3955:~# ceph -s
>   cluster:
>     id:     6b1b5117-6e08-4843-93d6-2da3cf8a6bae
>     health: HEALTH_WARN
>             1 MDSs report slow metadata IOs
>             noscrub,nodeep-scrub flag(s) set
>             Reduced data availability: 1 pg inactive, 1 pg down
>             1 subtrees have overcommitted pool target_size_bytes
>             1 subtrees have overcommitted pool target_size_ratio
>             18 slow requests are blocked > 32 sec
>             mons ld5505,ld5506 are low on available space
> 
>   services:
>     mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2h)
>     mgr: ld5507(active, since 28h), standbys: ld5506, ld5505
>     mds: cephfs:1 {0=ld4465=up:active} 1 up:standby
>     osd: 441 osds: 438 up, 438 in

I think this is the problem. You are lacking a few OSDs which are
probably needed to get that PG back online.

>          flags noscrub,nodeep-scrub
> 
>   data:
>     pools:   6 pools, 8432 pgs
>     objects: 63.28M objects, 241 TiB
>     usage:   723 TiB used, 796 TiB / 1.5 PiB avail
>     pgs:     0.012% pgs not active
>              8431 active+clean
>              1    creating+down
> 
>   io:
>     client:   33 MiB/s rd, 14.20k op/s rd, 0 op/s wr
> 
> 
> Am 15.11.2019 um 13:24 schrieb Wido den Hollander:
>>
>> On 11/15/19 11:22 AM, Thomas Schneider wrote:
>>> Hi,
>>> ceph health is reporting: pg 59.1c is creating+down, acting [426,438]
>>>
>>> root@ld3955:~# ceph health detail
>>> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub
>>> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1
>>> subtrees have overcommitted pool target_size_bytes; 1 subtrees have
>>> overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on
>>> available space
>>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>>     mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest
>>> blocked for 120721 secs
>>> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
>>> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down
>>>     pg 59.1c is creating+down, acting [426,438]
>>> MON_DISK_LOW mons ld5505,ld5506 are low on available space
>>>     mon.ld5505 has 22% avail
>>>     mon.ld5506 has 29% avail
>>>
>>> root@ld3955:~# ceph pg dump_stuck inactive
>>> ok
>>> PG_STAT STATE         UP        UP_PRIMARY ACTING    ACTING_PRIMARY
>>> 59.1c   creating+down [426,438]        426 [426,438]            426
>>>
>>> How can I fix this?
>> Did you change anything to the cluster?
>>
>> Can you share this output:
>>
>> $ ceph status
>>
>> As there seems that more things are wrong with this system. This doesn't
>> happen out of the blue. Something must have happened.
>>
>> Wido
>>
>>> THX
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx