[RFC PATCH 1/2] ceph: retry CRUSH map descent before retrying bucket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For the first few rejections or collisions, we'll retry the descent to
keep objects spread across the cluster.  After that, we'll fall back
to exhaustive search of the bucket to avoid trying forever in the event
a bucket has only a few in items and the hash doesn't do a good job of
finding them.

Signed-off-by: Jim Schutt <jaschut@xxxxxxxxxx>
---
 src/crush/mapper.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/crush/mapper.c b/src/crush/mapper.c
index 8857577..e5dc950 100644
--- a/src/crush/mapper.c
+++ b/src/crush/mapper.c
@@ -350,8 +350,7 @@ static int crush_choose(const struct crush_map *map,
 					reject = 1;
 					goto reject;
 				}
-				if (flocal >= (in->size>>1) &&
-				    flocal > orig_tries)
+				if (ftotal >= orig_tries)	 /* exhaustive bucket search */
 					item = bucket_perm_choose(in, x, r);
 				else
 					item = crush_bucket_choose(in, x, r);
@@ -420,10 +419,19 @@ reject:
 				if (reject || collide) {
 					ftotal++;
 					flocal++;
-
-					if (collide && flocal < 3)
-						/* retry locally a few times */
-						retry_bucket = 1;
+					/*
+					 * For the first couple rejections or collisions,
+					 * we'll retry the descent to keep objects spread
+					 * across the cluster.  After that, we'll fall back
+					 * to exhaustive search of buckets to avoid trying
+					 * forever in the event a bucket has only a few
+					 * "in" items and the hash doesn't do a good job
+					 * of finding them.  Note that we need to retry
+					 * descent during that phase so that multiple
+					 * buckets can be exhaustively searched.
+					 */
+					if (ftotal <= orig_tries)
+						retry_descent = 1;
 					else if (flocal <= in->size + orig_tries)
 						/* exhaustive bucket search */
 						retry_bucket = 1;
-- 
1.7.8.2


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux