The previous hash bucket culling resulted in a somewhat unpredictable number of hash bucket entries in the order of magnitude of HASH_LIMIT. Replace this with a Bresenham-like algorithm leaving us with exactly HASH_LIMIT entries by uniform culling. Signed-off-by: David Kastrup <dak@xxxxxxx> --- This is to be applied on top of the patch for packing the index structure. Which is the reason it is posted as a followup to the respective article. In contrast to that patch, this patch has only been tested/reviewed by myself as to yet. diff-delta.c | 41 +++++++++++++++++++++++++++++++---------- 1 files changed, 31 insertions(+), 10 deletions(-) diff --git a/diff-delta.c b/diff-delta.c index d355e5e..1a8d6fb 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -207,19 +207,40 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize) * the reference buffer. */ for (i = 0; i < hsize; i++) { - if (hash_count[i] < HASH_LIMIT) + int acc; + + if (hash_count[i] <= HASH_LIMIT) continue; + + entries -= hash_count[i] - HASH_LIMIT; + /* We leave exactly HASH_LIMIT entries in the bucket */ + entry = hash[i]; + acc = 0; do { - struct unpacked_index_entry *keep = entry; - int skip = hash_count[i] / HASH_LIMIT; - do { - --entries; - entry = entry->next; - } while(--skip && entry); - ++entries; - keep->next = entry; - } while(entry); + acc += hash_count[i] - HASH_LIMIT; + if (acc > 0) { + struct unpacked_index_entry *keep = entry; + do { + entry = entry->next; + acc -= HASH_LIMIT; + } while (acc > 0); + keep->next = entry->next; + } + entry = entry->next; + } while (entry); + + /* Assume that this loop is gone through exactly + * HASH_LIMIT times and is entered and left with + * acc==0. So the first statement in the loop + * contributes (hash_count[i]-HASH_LIMIT)*HASH_LIMIT + * to the accumulator, and the inner loop consequently + * is run (hash_count[i]-HASH_LIMIT) times, removing + * one element from the list each time. Since acc + * balances out to 0 at the final run, the inner loop + * body can't be left with entry==NULL. So we indeed + * encounter entry==NULL in the outer loop only. + */ } free(hash_count); -- 1.5.3.GIT - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html