Re: Objects degraded after adding disks

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 1 Oct 2019 08:13:41 -0700

On Tue, Oct 1, 2019 at 5:25 AM Frank Schilder <frans@xxxxxx> wrote:
>
> I'm running a cepf fs with an 8+2 EC data pool. Disks are on 10 hosts and failure domain is host. Version is mimic 13.2.2. Today I added a few OSDs to one of the hosts and observed that a lot of PGs became inactive even though 9 out of 10 hosts were up all the time. After getting the 10th host and all disks up, I still ended up with a large amount of undersized PGs and degraded objects, which I don't understand as no OSD was removed.
>
>
> Here some details about the steps taken on the host with new disks, main questions at the end:
>
> - shut down OSDs (systemctl stop docker)
> - reboot host (this is necessary due to OS deployment via warewulf)
>
> Devices got renamed and not all disks came back up (4 OSDs remained down). This is expected, I need to re-deploy the containers to adjust for device name changes. Around this point PGs started peering and some failed waiting for 1 of the down OSDs. I don't understand why they didn't just remain active with 9 out of 10 disks. Until this moment of some OSDs coming up, all PGs were active. With min_size=9 I would expect all PGs to remain active with no changes to 9 out of the 10 hosts.
>
> - redeploy docker containers
> - all disks/OSDs come up, including the 4 OSDs from above
> - inactive PGs complete peering and become active
> - now I have a los of degraded Objects and undersized PGs even though not a single OSD was removed
>
> I don't understand why I have degraded objects. I should just have misplaced objects:
>
> HEALTH_ERR
>             22995992/145698909 objects misplaced (15.783%)
>             Degraded data redundancy: 5213734/145698909 objects degraded (3.578%), 208 pgs degraded, 208
> pgs undersized
>             Degraded data redundancy (low space): 169 pgs backfill_toofull
>
> Note: The backfill_toofull with low utilization (usage: 38 TiB used, 1.5 PiB / 1.5 PiB avail) is a known issue in ceph (https://tracker.ceph.com/issues/39555)
>
> Also, I should be able to do whatever with 1 out of 10 hosts without loosing data access. What could be the problem here?
>
>
> Questions summary:
>
> Why does peering not succeed to keep all PGs active with 9 out of 10 OSDs up and in?

I would just double check that min_size=9 for your pool, it should be
set to that, but that is the only reason I can think that you are
seeing this problem.

> Why do undersized PGs arise even though all OSDs are up?

I've noticed on my cluster that sometimes when an OSD goes down, the
EC considers the OSD missing when it comes back online and needs to
resync. Not sure what exactly causes this to happen, but it happens
more often than it should.

> Why do degraded objects arise even though no OSD was removed?

If you are writing objects while the PGs are undersized (host/osds
down), then it will have to sync those writes to the OSDs that were
down. This is the number of degraded objects.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx