Re: Low space hindering backfill and 2 backfillfull osd(s)

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Fri, 14 Oct 2022 10:32:57 +0000

Thank you very much the detailed explanation. Will wait then, based on the speed 5 more hours, let's see

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Janne Johansson <icepic.dz@xxxxxxxxx>
Sent: Friday, October 14, 2022 5:26 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Cc: Ceph Users <ceph-users@xxxxxxx>
Subject: Re:  Low space hindering backfill and 2 backfillfull osd(s)

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Den fre 14 okt. 2022 kl 12:10 skrev Szabo, Istvan (Agoda)
<Istvan.Szabo@xxxxxxxxx>:
> I've added 5 more nodes to my cluster and got this issue.
> HEALTH_WARN 2 backfillfull osd(s); 17 pool(s) backfillfull; Low space
> hindering backfill (add storage if this doesn't resolve itself): 4 pgs backfill_toofull OSD_BACKFILLFULL 2 backfillfull osd(s)
>     osd.150 is backfill full
>     osd.178 is backfill full
>
> I read in the mail list that I might need to increase the pg on the some pool to have smaller pgs.
> Also read I might need to reweigt the mentioned full osd with 1.2 until it's ok, then set back.
> Which would be the best solution?

It is not unusual to see "backfill_toofull", especially if the reason for expanding was that space was getting tight.

When you add new drives, a lot of PGs need to move, not only from "old OSDs to new" but in all possible directions.
As an example, if you had 16 PGs and three hosts (A,B and C), the PGs would end up something like:

A 1,4,7,10,13,16
B 2,5,8,11,14
C 3,6,9,12,15
(5-6 PGs per host)

Then you add host D and E, now it should become something like:

A 1,6,11,16
B 2,7,12
C 3,8,13
D 4,9,14
E 5,10,15
(3-4 PGs per host)

>From here we can see that A will keep PG 1 and 16, B will keep PG 2, C keeps PG 3, but more or less ALL the other PGs will be moving about.
D and E will of course get PGs because they are added, but A will send PG 7 to host B, B send PG 8 to host C and so on.

If A,B and C are almost full and you add new OSDs (D and E), the cluster will try to schedule *all* the moves.

Of course pgs 4,5,9,10,14 and 15 can just start copying at any time since D and E are empty when they arrive, but the cluster will also ask A to send PG 7 to B, and B will try to send PG 8 to C, and if PG 7 makes B go past backfill_full limit, or of PG 8 makes host C pass it, they will pause those moves with the state backfill_toofull and just have them being "misplaced"/"remapped".

In the meantime, the other moves are going to get handled, and sooner or later, the host B and C will have moved off so much data so that PG
7 and 8 can move to their correct places, but this might mean those will be among the last to move about.

The reality is not 100% as simple as this, the straw2 bucket placing algorithm tries to help prevent parts of this, and there might be cases where two of the old hosts would send PGs to each other, basically just swapping them around and the point that any PG is made up of ECk+m/#replica parts makes this explanation a bit too simple, but in broad terms, this is why you get "errors" when adding new empty drives and it is perfectly ok, and will fix itself as soon as the other moves have created space enough for the queued-toofull moves to be performed without driving an OSD over the limits.

--
May the most significant bit of your life be positive.

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx