Re: Reduced data availability: 2 pgs inactive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
If this is the case: increase "osd max pg per osd hard ratio"

Check "ceph pg <pgid> query" to see why it isn't activating.

Can you share the output of "ceph osd df tree" and "ceph pg <pgid> query" of the affected PGs?


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber <taeuber@xxxxxxx> wrote:
Hi there!

Recently I made our cluster rack aware
by adding racks to the crush map.
The failure domain was and still is "host".

rule cephfs2_data {
        id 7
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take PRZ
        step chooseleaf indep 0 type host
        step emit


Then I sorted the hosts into the new
rack buckets of the crush map as they
are in reality, by:
  # osd crush move onodeX rack=XYZ
for all hosts.

The cluster started to reorder the data.

In the end the cluster has now:
HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized
FS_DEGRADED 1 filesystem is degraded
    fs cephfs_1 is degraded
PG_AVAILABILITY Reduced data availability: 2 pgs inactive
    pg 21.2e4 is stuck inactive for 142792.952697, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2]
    pg 23.5 is stuck inactive for 142791.437243, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized
    pg 21.2e4 is stuck undersized for 142779.321192, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2]
    pg 23.5 is stuck undersized for 142789.747915, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]

The cluster hosts a cephfs which is
not mountable anymore.

I tried a few things (as you can see:
forced_backfill), but failed.

The cephfs_data pool is EC 4+2.
Both inactive pgs seem to have enough
copies to recalculate the contents for
all osds.

Is there a chance to get both pgs
clean again?

How can I force the pgs to recalculate
all necessary copies?


Thanks
Lars
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux