[PATCH] diff-delta.c: Rationalize culling of hash buckets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The previous hash bucket culling resulted in a somewhat unpredictable
number of hash bucket entries in the order of magnitude of HASH_LIMIT.

Replace this with a Bresenham-like algorithm leaving us with exactly
HASH_LIMIT entries by uniform culling.
---

This is assuming that the previous patch series has been applied
(lacking any final comments or Acks on it; but I think I addressed the
issues).  The hash bucket code culling has really annoyed me with its
unintuitive results.  If the preceding patch series is thrown out (I
can't think why it would), one could rework this to fit the current
master if really needed.

 diff-delta.c |   41 +++++++++++++++++++++++++++++++----------
 1 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index e7c33aa..98795c6 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -207,19 +207,40 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize)
 	 * the reference buffer.
 	 */
 	for (i = 0; i < hsize; i++) {
-		if (hash_count[i] < HASH_LIMIT)
+		int acc;
+
+		if (hash_count[i] <= HASH_LIMIT)
 			continue;
+
+		entries -= hash_count[i] - HASH_LIMIT;
+		/* We leave exactly HASH_LIMIT entries in the bucket */
+
 		uentry = uhash[i];
+		acc = 0;
 		do {
-			struct unpacked_index_entry *keep = uentry;
-			int skip = hash_count[i] / HASH_LIMIT;
-			do {
-				--entries;
-				uentry = uentry->next;
-			} while(--skip && uentry);
-			++entries;
-			keep->next = uentry;
-		} while(uentry);
+			acc += hash_count[i] - HASH_LIMIT;
+			if (acc > 0) {
+				struct unpacked_index_entry *keep = uentry;
+				do {
+					uentry = uentry->next;
+					acc -= HASH_LIMIT;
+				} while (acc > 0);
+				keep->next = uentry->next;
+			}
+			uentry = uentry->next;
+		} while (uentry);
+
+		/* Assume that this loop is gone through exactly
+		 * HASH_LIMIT times and is entered and left with
+		 * acc==0.  So the first statement in the loop
+		 * contributes (hash_count[i]-HASH_LIMIT)*HASH_LIMIT
+		 * to the accumulator, and the inner loop consequently
+		 * is run (hash_count[i]-HASH_LIMIT) times, removing
+		 * one element from the list each time.  Since acc
+		 * balances out to 0 at the final run, the inner loop
+		 * body can't be left with uentry==NULL.  So we indeed
+		 * encounter uentry==NULL in the outer loop only.
+		 */
 	}
 	free(hash_count);
 
-- 
1.5.3.GIT

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux