EC Backfill Observations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all,

I wanted to confirm my understanding of some of the mechanics of
backfill in EC pools. I've yet to find a document that outlines this
in detail; if there is one, please send it my way. :) Some of what I
write below is likely in the "well, duh" category, but I tended
towards completeness.

First off, I understand that backfill reservations work the same way
between replicated pools and EC pools. A local reservation is taken on
the primary OSD, then a remote reservation on the backfill target(s),
before the backfill is allowed to begin. Until this point, the
backfill is in the backfill_wait state.

When the backfill begins, though, is when the differences begin. Let's
say we have an EC 3:2 PG that's backfilling from OSD 2 to OSD 5
(formatted here like pgs_brief):

    1.1  active+remapped+backfilling   [0,1,5,3,4]  0   [0,1,2,3,4]  0

The question in my mind was: Where is the data for this backfill
coming from? In replicated pools, all reads come from the primary.
However, in this case, the primary does not have the data in question;
the primary has to either read the EC chunk from OSD 2, or it has to
reconstruct it by reading from 3 of the OSDs in the acting set.

Based on observation, I _think_ this is what happens:
1. As long as the PG is not degraded, the backfill read is simply
forwarded by the primary to OSD 2.
2. Once the PG becomes degraded, the backfill read needs to use the
reconstructing path, and begins reading from 3 of the OSDs in the
acting set.

Questions:
1. Can anyone confirm or correct my description of how EC backfill
operates? In particular, in case 2 above, does it matter whether OSD 2
is the cause of degradation, for example? Does the read still get
forwarded to a single OSD when it's parity chunks that are being moved
via backfill?
2. I'm curious as to why a 3rd reservation, for the source OSD, wasn't
introduced as a part of EC in Ceph. We've occasionally seen an OSD
become overloaded because several backfills were reading from it
simultaneously, and there's no way to control this via the normal
osd_max_backfills mechanism. Is anyone aware of discussions to this
effect?

Thanks!
Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux