On Wed, Sep 26, 2018 at 2:28 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > +struct refname_hash_entry { > + struct hashmap_entry ent; /* must be the first member */ $ git grep "struct hashmap_entry" reveals that we have another convention that we follow but not document, which is to stress the importance of putting the hashmap_entry first. ;-) > + struct object_id oid; > + char refname[FLEX_ARRAY]; > +}; > + > +static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data, > + const struct refname_hash_entry *e1, > + const struct refname_hash_entry *e2, > + const void *keydata) > +{ > + return strcmp(e1->refname, keydata ? keydata : e2->refname); > +} and later hashmap_init(... (hashmap_cmp_fn)refname_hash_entry_cmp, ...); I wonder if we'd want to stick to this style, as that seems to be the easiest to do, and drop the efforts started in 55c965f3a2 (Merge branch 'sb/hashmap-cleanup', 2017-08-11), that avoids the cast in the init, but puts the (implicit) casts in the _cmp function as we'd take void pointers as the function signature and recast to a local variable. > + > +static struct refname_hash_entry *refname_hash_add(struct hashmap *map, > + const char *refname, > + const struct object_id *oid) > +{ > + struct refname_hash_entry *ent; > + size_t len = strlen(refname); > + > + FLEX_ALLOC_MEM(ent, refname, refname, len); > + hashmap_entry_init(ent, memhash(refname, len)); memhash is preferred over strhash as we already know the length. For a second I wondered why we use it instead of strhash > +static int refname_hash_exists(struct hashmap *map, const char *refname) > +{ > + struct hashmap_entry key; > + size_t len = strlen(refname); > + hashmap_entry_init(&key, memhash(refname, len)); Here we could use strhash, but as they could change to differing implementation, we keep using memhash. > /* > - * For all the tags in the remote_refs string list, > + * For all the tags in the remote_refs_list, > * add them to the list of refs to be fetched > */ > - for_each_string_list_item(item, &remote_refs) { > + for_each_string_list_item(remote_ref_item, &remote_refs_list) { > + const char *refname = remote_ref_item->string; > + struct hashmap_entry key; > + > + hashmap_entry_init(&key, memhash(refname, strlen(refname))); > + item = hashmap_get(&remote_refs, &key, refname); > + if (!item) > + continue; /* can this happen??? */ I thought this could happen when we have hash collisions, I convinced myself otherwise, as we pass the refname to hashmap_get as the 'keydata', so when there is a hash collision we keep comparing to the real value. And as whenever we have an insert to remote_refs_list, we also have a refname_hash_add to the remote_refs hashmap, I think the return of NULL is not possible, and we could switch it to BUG(...); > All this code predates more modern in-core lookup API like hashmap; > replace them with two hashmaps and one string list---the hashmaps > are used for look-ups and the string list is to keep the order of > items in the returned result stable (which is the only single thing > hashmap does worse than lookups on string-list). > > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > --- I wonder if we want to add an API to the hashmap that allows for ordered iteration (as hashmap_iter_* relies on the hashing function for its output) I think it may be too early to do so, but there are 2 thoughts on how to do it: * Each element keeps (prev/next) pointers to the previously inserted element and updates the next pointer when the next element is inserted. When removing elements we'd have to update the next and prev pointer of the adjacent elements. Then iteration can be done in insertion-order. Probably this would go into its own file ordered-hashmap.{c, h} * provide an order function to hashmap_iter_init, which then orders according to that function before the first call of hashmap_iter_next returns. > * Just to remind ourselves that we talked about getting rid of > string-list used as look-up tables by replacing them with hashmap > but haven't seen much effort expended on it. I think this is my > first semi-serious use of hashmap, and the code is expected to be > full of newbie mistakes, inefficiencies and ignorance of idioms; > pointing out any of which is much appreciated. I could not find newbie mistakes, but gave my thoughts anyway. Stefan