Hi, Does somebody see PG inactive like this before? We get first pool outage: PG_AVAILABILITY Reduced data availability: 2 pgs inactive pg 4.1f1 is stuck inactive for 8637.783533, current state clean+premerge+peered, last acting [312,358,331] pg 4.9f1 is stuck inactive for 8637.783331, current state remapped+premerge+backfilling+peered, last acting [312,331,374] Then added alerts for premerge state for long time, and get second outage: PG_AVAILABILITY Reduced data availability: 2 pgs inactive pg 4.1d9 is stuck inactive for 1000.400328, current state remapped+premerge+backfilling+peered, last acting [328,315,352] pg 4.9d9 is stuck inactive for 1000.400333, current state remapped+premerge+backfill_wait+peered, last acting [328,315,352] Actually, from Nautilus we was made PG reduce many times. This is first time problem occurence and only on one cluster Before this, PG reducing was going for a week on this cluster, but only one by one PG and ignore max_misplaced option totally - I wasn't debug why For a 3 hour before first outage - osd.362 was added to this pool Tracker for this - https://tracker.ceph.com/issues/52509 <https://tracker.ceph.com/issues/52509> Thanks, k _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx