For the first few rejections or collisions, we'll retry the descent to keep objects spread across the cluster. After that, we'll fall back to exhaustive search of the bucket to avoid trying forever in the event a bucket has only a few in items and the hash doesn't do a good job of finding them. Signed-off-by: Jim Schutt <jaschut@xxxxxxxxxx> --- src/crush/mapper.c | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-) diff --git a/src/crush/mapper.c b/src/crush/mapper.c index 8857577..e5dc950 100644 --- a/src/crush/mapper.c +++ b/src/crush/mapper.c @@ -350,8 +350,7 @@ static int crush_choose(const struct crush_map *map, reject = 1; goto reject; } - if (flocal >= (in->size>>1) && - flocal > orig_tries) + if (ftotal >= orig_tries) /* exhaustive bucket search */ item = bucket_perm_choose(in, x, r); else item = crush_bucket_choose(in, x, r); @@ -420,10 +419,19 @@ reject: if (reject || collide) { ftotal++; flocal++; - - if (collide && flocal < 3) - /* retry locally a few times */ - retry_bucket = 1; + /* + * For the first couple rejections or collisions, + * we'll retry the descent to keep objects spread + * across the cluster. After that, we'll fall back + * to exhaustive search of buckets to avoid trying + * forever in the event a bucket has only a few + * "in" items and the hash doesn't do a good job + * of finding them. Note that we need to retry + * descent during that phase so that multiple + * buckets can be exhaustively searched. + */ + if (ftotal <= orig_tries) + retry_descent = 1; else if (flocal <= in->size + orig_tries) /* exhaustive bucket search */ retry_bucket = 1; -- 1.7.8.2 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html