On Sat, Mar 30, 2013 at 12:15 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > >> If a position in object hash table is taken, we currently check out >> the next one. This could potentially create long object chains. We >> could create linked lists instead and leave the next slot alone. > > In the current code, not just the logic in lookup_object(), but the > logic to enforce load factor when create_object() decides to call > grow_object_hash() and object enumeration implemented by > get_max_object_index() and get_indexed_object() are closely tied to > the open addressing scheme. If you want to switch to any other > method (e.g. separate chaining) these need to be updated quite a > bit. > > I do not see get_max_object_index() and get_indexed_object() updated > at all. Do fsck, index-pack, name-rev and upload-pack still work? No, apparently not. I should have been hit hard by not updating grow_object_hash(). But I think it was ok because I was on top of my preallocation patch and there probably were not many chains before that patch kicked in. > This particular implementation that uses a fake "union" is a bit > ugly, but in general, it may be worth rethinking the object hash > implementation. I vaguely recall trying cuckoo in the vicinity of > this code. Yeah I saw that. Need to read and maybe try again some time. Still playing with the linked list idea. If every time we find something in a chain, we swap it to top (with hope it'll be accessed more often), then the number of traversing in chains goes down from 33m times to 21.5m times. If we just push it up one node in the linked list, it's 21.3m times. Interesting. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html