Re: Fwd: how io works when backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 29 Dec 2015, Dong Wu wrote:
> if add in osd.7 and 7 becomes the primary: pg1.0 [1, 2, 3]  --> pg1.0
> [7, 2, 3],  is it similar with the example above?
> still install a pg_temp entry mapping the PG back to [1, 2, 3], then
> backfill happens to 7, normal io write to [1, 2, 3], if io to the
> portion of the PG that has already been backfilled will also be sent
> to osd.7?

Yes (although I forget how it picks the ordering of the osds in the temp 
mapping).  See PG::choose_acting() for the details.

> how about these examples about removing an osd:
> - pg1.0 [1, 2, 3]
> - osd.3 down and be removed
> - mapping changes to [1, 2, 5], but osd.5 has no data, then install a
> pg_temp mapping the PG back to [1, 2], then backfill happens to 5,
> - normal io write to [1, 2], if io hits object which has been
> backfilled to osd.5, io will also send to osd.5
> - when backfill completes, remove the pg_temp and mapping changes back
> to [1, 2, 5]

Yes

> another example:
> - pg1.0 [1, 2, 3]
> - osd.3 down and be removed
> - mapping changes to [5, 1, 2], but osd.5 has no data of the pg, then
> install a pg_temp mapping the PG back to [1, 2] which osd.1
> temporarily becomes the primary, then backfill happens to 5,
> - normal io write to [1, 2], if io hits object which has been
> backfilled to osd.5, io will also send to osd.5
> - when backfill completes, remove the pg_temp and mapping changes back
> to [5, 1, 2]
> 
> is my ananysis right?

Yep!

sage

> 
> 2015-12-29 1:30 GMT+08:00 Sage Weil <sage@xxxxxxxxxxxx>:
> > On Mon, 28 Dec 2015, Zhiqiang Wang wrote:
> >> 2015-12-27 20:48 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>:
> >> > Hi,
> >> > When add osd or remove osd, ceph will backfill to rebalance data.
> >> > eg:
> >> > - pg1.0    [1, 2, 3]
> >> > - add an osd(eg. osd.7)
> >> > - ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
> >> > - if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
> >> > object a is backfilling
> >> > - when a write io hits object a, then the io needs to wait for its
> >> > complete, then goes on.
> >> > - but if io hits object b which has not been backfilled, io reaches
> >> > osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
> >> > have object b, so osd.7 needs to wait for object b to backfilled, then
> >> > write. Is it right? Or osd.1 only send the io to osd.2, not both?
> >>
> >> I think in this case, when the write of object b reaches osd.1, it
> >> holds the client write, raises the priority of the recovery of object
> >> b, and kick off the recovery of it. When the recovery of object b is
> >> done, it requeue the client write, and then everything goes like
> >> usual.
> >
> > It's more complicated than that.  In a normal (log-based) recovery
> > situation, it is something like the above: if the acting set is [1,2,3]
> > but 3 is missing the latest copy of A, a write to A will block on the
> > primary while the primary initiates recovery of A immediately.  Once that
> > completes the IO will continue.
> >
> > For backfill, it's different.  In your example, you start with [1,2,3]
> > then add in osd.7.  The OSD will see that 7 has no data for teh PG and
> > install a pg_temp entry mapping the PG back to [1,2,3] temporarily.  Then
> > things will proceed normally while backfill happens to 7.  Backfill won't
> > interfere with normal IO at all, except that IO to the portion of the PG
> > that has already been backfilled will also be sent to the backfill target
> > (7) so that it stays up to date.  Once it complets, the pg_temp entry is
> > removed and the mapping changes back to [1,2,7].  Then osd.3 is allowed to
> > remove it's copy of the PG.
> >
> > sage
> >
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux