Hey all, I wanted to confirm my understanding of some of the mechanics of backfill in EC pools. I've yet to find a document that outlines this in detail; if there is one, please send it my way. :) Some of what I write below is likely in the "well, duh" category, but I tended towards completeness. First off, I understand that backfill reservations work the same way between replicated pools and EC pools. A local reservation is taken on the primary OSD, then a remote reservation on the backfill target(s), before the backfill is allowed to begin. Until this point, the backfill is in the backfill_wait state. When the backfill begins, though, is when the differences begin. Let's say we have an EC 3:2 PG that's backfilling from OSD 2 to OSD 5 (formatted here like pgs_brief): 1.1 active+remapped+backfilling [0,1,5,3,4] 0 [0,1,2,3,4] 0 The question in my mind was: Where is the data for this backfill coming from? In replicated pools, all reads come from the primary. However, in this case, the primary does not have the data in question; the primary has to either read the EC chunk from OSD 2, or it has to reconstruct it by reading from 3 of the OSDs in the acting set. Based on observation, I _think_ this is what happens: 1. As long as the PG is not degraded, the backfill read is simply forwarded by the primary to OSD 2. 2. Once the PG becomes degraded, the backfill read needs to use the reconstructing path, and begins reading from 3 of the OSDs in the acting set. Questions: 1. Can anyone confirm or correct my description of how EC backfill operates? In particular, in case 2 above, does it matter whether OSD 2 is the cause of degradation, for example? Does the read still get forwarded to a single OSD when it's parity chunks that are being moved via backfill? 2. I'm curious as to why a 3rd reservation, for the source OSD, wasn't introduced as a part of EC in Ceph. We've occasionally seen an OSD become overloaded because several backfills were reading from it simultaneously, and there's no way to control this via the normal osd_max_backfills mechanism. Is anyone aware of discussions to this effect? Thanks! Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx