On Thu, 23 Aug 2007, David Kastrup wrote: > > I am currently looking what can be done to speed up deltaing. The > following code can be found here: > > for (i = 0; i < hsize; i++) { > if (hash_count[i] < HASH_LIMIT) > continue; > entry = hash[i]; > do { > struct index_entry *keep = entry; > int skip = hash_count[i] / HASH_LIMIT / 2; > do { > entry = entry->next; > } while(--skip && entry); > keep->next = entry; > } while(entry); > } > > If I analyze what happens for various values of hash_count[i], I get > the following (the first case is by far the worst): > > HASH_LIMIT <= hash_count[i] < 2*HASH_LIMIT: > skip = 0; > do .. while terminates with negative skip and entry == 0, keep->next > is set to 0 -> all hashes except the first one get dropped. > Result is new_hash_count = 1. Ugh, ugh, ugh. > > 2*HASH_LIMIT <= hash_count[i] < 4*HASH_LIMIT > skip = 1; > do .. while does one iteration, every second value is skipped, > result is that HASH_LIMIT <= new_hash_count < 2*HASH_LIMIT > > 4*HASH_LIMIT <= hash_limit[i] < 6*HASH_LIMIT > skip = 2; > do .. while does two iterations, two of three values are skipped, > result is that 4*HASH_LIMIT/3 <= new_hash_count < 2*HASH_LIMIT > > And so on. It would appear that if HASH_LIMIT is supposed to do what > it is seemingly intended for, the skip calculation has to be just > > int skip = hash_count[i] / HASH_LIMIT; > > Otherwise, there is completely broken behavior for values between > HASH_LIMIT and 2*HASH_LIMIT (where only a single hash survives), and > for larger values, the limit will be 2*HASH_LIMIT rather than > HASH_LIMIT as was probably intended. You're absolutely right. I don't know what I was thinking when I wrote that code, but the /2 is bogus. Please send a patch to Junio with my ACK. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html