Re: Fwd: how io works when backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 28 Dec 2015, Zhiqiang Wang wrote:
> 2015-12-27 20:48 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>:
> > Hi,
> > When add osd or remove osd, ceph will backfill to rebalance data.
> > eg:
> > - pg1.0    [1, 2, 3]
> > - add an osd(eg. osd.7)
> > - ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
> > - if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
> > object a is backfilling
> > - when a write io hits object a, then the io needs to wait for its
> > complete, then goes on.
> > - but if io hits object b which has not been backfilled, io reaches
> > osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
> > have object b, so osd.7 needs to wait for object b to backfilled, then
> > write. Is it right? Or osd.1 only send the io to osd.2, not both?
> 
> I think in this case, when the write of object b reaches osd.1, it
> holds the client write, raises the priority of the recovery of object
> b, and kick off the recovery of it. When the recovery of object b is
> done, it requeue the client write, and then everything goes like
> usual.

It's more complicated than that.  In a normal (log-based) recovery 
situation, it is something like the above: if the acting set is [1,2,3] 
but 3 is missing the latest copy of A, a write to A will block on the 
primary while the primary initiates recovery of A immediately.  Once that 
completes the IO will continue.

For backfill, it's different.  In your example, you start with [1,2,3] 
then add in osd.7.  The OSD will see that 7 has no data for teh PG and 
install a pg_temp entry mapping the PG back to [1,2,3] temporarily.  Then 
things will proceed normally while backfill happens to 7.  Backfill won't 
interfere with normal IO at all, except that IO to the portion of the PG 
that has already been backfilled will also be sent to the backfill target 
(7) so that it stays up to date.  Once it complets, the pg_temp entry is 
removed and the mapping changes back to [1,2,7].  Then osd.3 is allowed to 
remove it's copy of the PG.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux