Re: PG inactive when host is down despite CRUSH failure domain being host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I only took a quick look, but is that pool configured with size 2? The crush_rule says min_size 2 which would explain what you're describing.



Zitat von Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx>:

Hi,

I am having a weird phenomenon, which I am having trouble to debug. We have 16 OSDs per host, so when I reboot one node, 16 OSDs will be missing for a short time. Since our minimum CRUSH failure domain is host, this should not cause any problems. Unfortunately, I always have handful (1-5) PGs that become inactive nonetheless and are stuck in the state undersized+degraded+peered until the host and its OSDs are back up. The other 2000+ PGs that are also on these OSDs do not have this problem. In total, we have between 110 and 150 PGs per OSD with a configured maximum of 250, which should give us enough headspace.

The affected pools always seem to be RBD pools or at least I haven't seen it on our much larger RGW pool yet. The pool's CRUSH rule looks like this:

rule rbd-data {
        id 8
        type replicated
        min_size 2
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

ceph pg dump_stuck inactive gives me this:

PG_STAT  STATE                       UP          UP_PRIMARY ACTING      ACTING_PRIMARY 115.3    undersized+degraded+peered   [194,267]         194 [194,267]             194 115.13   undersized+degraded+peered  [151,1122]         151 [151,1122]             151 116.12   undersized+degraded+peered   [288,726]         288 [288,726]             288

and when I query one of the inactive PGs, I see (among other things):

    "up": [
        288,
        726
    ],
    "acting": [
        288,
        726
    ],
    "acting_recovery_backfill": [
        "288",
        "726"
    ],

    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2021-03-10T16:23:09.301174+0100",
            "might_have_unfound": [],
            "recovery_progress": {
                "backfill_targets": [],
                "waiting_on_backfill": [],
                "last_backfill_started": "MIN",
                "backfill_info": {
                    "begin": "MIN",
                    "end": "MIN",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            }
        },
        {
            "name": "Started",
            "enter_time": "2021-03-10T16:23:08.297622+0100"
        }
    ],

So you can see that two out of three OSDs on other hosts are indeed up and active and the . I also see the ceph-osd daemons running on those hosts, so the data is definitely there and the PG should be available. Do you have any idea why these PGs may be becoming inactive nonetheless? I am suspecting some kind of concurrency limit, but I wouldn't know which one that could be.

Thanks
Janek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux