Re: Reduced data availability: 2 pgs inactive

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 19 Jun 2019 13:19:55 +0200

Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
If this is the case: increase "osd max pg per osd hard ratio"

Check "ceph pg <pgid> query" to see why it isn't activating.

Can you share the output of "ceph osd df tree" and "ceph pg <pgid> query" of the affected PGs?

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber <taeuber@xxxxxxx> wrote:
Hi there!

Recently I made our cluster rack aware

by adding racks to the crush map.

The failure domain was and still is "host".

rule cephfs2_data {

        id 7

        type erasure

        min_size 3

        max_size 6

        step set_chooseleaf_tries 5

        step set_choose_tries 100

        step take PRZ

        step chooseleaf indep 0 type host

        step emit

Then I sorted the hosts into the new

rack buckets of the crush map as they

are in reality, by:

  # osd crush move onodeX rack=XYZ

for all hosts.

The cluster started to reorder the data.

In the end the cluster has now:

HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized

FS_DEGRADED 1 filesystem is degraded

    fs cephfs_1 is degraded

PG_AVAILABILITY Reduced data availability: 2 pgs inactive

    pg 21.2e4 is stuck inactive for 142792.952697, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2]

    pg 23.5 is stuck inactive for 142791.437243, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]

PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized

    pg 21.2e4 is stuck undersized for 142779.321192, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2]

    pg 23.5 is stuck undersized for 142789.747915, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]

The cluster hosts a cephfs which is

not mountable anymore.

I tried a few things (as you can see:

forced_backfill), but failed.

The cephfs_data pool is EC 4+2.

Both inactive pgs seem to have enough

copies to recalculate the contents for

all osds.

Is there a chance to get both pgs

clean again?

How can I force the pgs to recalculate

all necessary copies?

Thanks

Lars

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com