Re: Low space hindering backfill and 2 backfillfull osd(s)

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 14 Oct 2022 12:25:51 +0200

Den fre 14 okt. 2022 kl 12:10 skrev Szabo, Istvan (Agoda)
<Istvan.Szabo@xxxxxxxxx>:
> I've added 5 more nodes to my cluster and got this issue.
> HEALTH_WARN 2 backfillfull osd(s); 17 pool(s) backfillfull; Low space hindering backfill (add storage if this doesn't resolve itself): 4 pgs backfill_toofull
> OSD_BACKFILLFULL 2 backfillfull osd(s)
>     osd.150 is backfill full
>     osd.178 is backfill full
>
> I read in the mail list that I might need to increase the pg on the some pool to have smaller pgs.
> Also read I might need to reweigt the mentioned full osd with 1.2 until it's ok, then set back.
> Which would be the best solution?

It is not unusual to see "backfill_toofull", especially if the reason
for expanding was that space was getting tight.

When you add new drives, a lot of PGs need to move, not only from "old
OSDs to new" but in all possible directions.
As an example, if you had 16 PGs and three hosts (A,B and C), the PGs
would end up something like:

A 1,4,7,10,13,16
B 2,5,8,11,14
C 3,6,9,12,15
(5-6 PGs per host)

Then you add host D and E, now it should become something like:

A 1,6,11,16
B 2,7,12
C 3,8,13
D 4,9,14
E 5,10,15
(3-4 PGs per host)

>From here we can see that A will keep PG 1 and 16, B will keep PG 2, C
keeps PG 3, but more or less ALL the other PGs will be moving about.
D and E will of course get PGs because they are added, but A will send
PG 7 to host B, B send PG 8 to host C and so on.

If A,B and C are almost full and you add new OSDs (D and E), the
cluster will try to schedule *all* the moves.

Of course pgs 4,5,9,10,14 and 15 can just start copying at any time
since D and E are empty when they arrive, but the cluster will also
ask A to send PG 7 to B, and B will try to send PG 8 to C, and if PG 7
makes B go past backfill_full limit, or of PG 8 makes host C pass it,
they will pause those moves with the state backfill_toofull and just
have them being "misplaced"/"remapped".

In the meantime, the other moves are going to get handled, and sooner
or later, the host B and C will have moved off so much data so that PG
7 and 8 can move to their correct places, but this might mean those
will be among the last to move about.

The reality is not 100% as simple as this, the straw2 bucket placing
algorithm tries to help prevent parts of this, and there might be
cases where two of the old hosts would send PGs to each other,
basically just swapping them around and the point that any PG is made
up of ECk+m/#replica parts makes this explanation a bit too simple,
but in broad terms, this is why you get "errors" when adding new empty
drives and it is perfectly ok, and will fix itself as soon as the
other moves have created space enough for the queued-toofull moves to
be performed without driving an OSD over the limits.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx