[RFC PATCH 0/2] Distribute re-replicated objects evenly after OSD failure

"Jim Schutt" <jaschut@xxxxxxxxxx> · Thu, 10 May 2012 16:38:24 -0600

Hi Sage,

I've been trying to solve the issue mentioned in tracker #2047, which I 
think is the same as I described in
  http://www.spinics.net/lists/ceph-devel/msg05824.html

The attached patches seem to fix it for me.  I also attempted to 
address the local search issue you mentioned in #2047.

I'm testing this using a cluster with 3 rows, 2 racks/row, 2 hosts/rack,
4 osds/host. I tested against a CRUSH map with the rules:
	step take root
	step chooseleaf firstn 0 type rack
	step emit

I'm in the processes of testing this as follows:

I wrote some data to the cluster, then started shutting down OSDs using
"init-ceph stop osd.n". For the first rack's worth, I shut OSDs down
sequentially.  I waited for recovery to complete each time before
stopping the next OSD.  For the next rack I shut down the first 3 OSDs
on a host at the same time, waited for recovery to complete, then shut
down the last OSD on that host.  For the next racks, I shut down all
the OSDs on the hosts in the rack at the same time.

Right now I'm waiting for recovery to complete after shutting down
the third rack.  Once recovery completed after each phase so far,
there were no degraded objects.

So, this is looking fairly solid to me so far.  What do you think?

Thanks -- Jim

Jim Schutt (2):
  ceph: retry CRUSH map descent before retrying bucket
  ceph: retry CRUSH map descent from root if leaf is failed

 src/crush/mapper.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

-- 
1.7.8.2

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html