Am 07.07.2014 19:43, schrieb Junio C Hamano: > Karsten Blees <karsten.blees@xxxxxxxxx> writes: > >> Hashmap entries are typically looked up by just a key. The hashmap_get() >> API expects an initialized entry structure instead, to support compound >> keys. This flexibility is currently only needed by find_dir_entry() in >> name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other >> (currently five) call sites of hashmap_get() have to set up a near emtpy > > s/emtpy/empty/; > >> entry structure, resulting in duplicate code like this: >> >> struct hashmap_entry keyentry; >> hashmap_entry_init(&keyentry, hash(key)); >> return hashmap_get(map, &keyentry, key); >> >> Add a hashmap_get_from_hash() API that allows hashmap lookups by just >> specifying the key and its hash code, i.e.: >> >> return hashmap_get_from_hash(map, hash(key), key); > > While I think the *_get_from_hash() is an improvement over the > three-line sequence, I find it somewhat strange that callers of it > still must compute the hash code themselves, instead of letting the > hashmap itself to apply the user supplied hash function to the key > to derive it. After all, the map must know how to compare two > entries via a user-supplied cmpfn given at the map initialization > time, and the algorithm to derive the hash code falls into the same > category, in the sense that both must stay the same during the > lifetime of a hashmap, no? Unless there is a situation where you > need to be able to initialize a hashmap_entry without knowing which > hashmap the entry will eventually be stored, it feels dubious API > that exposes to the outside callers hashmap_entry_init() that takes > the hash code without taking the hashmap itself. > > In other words, why isn't hashmap_get() more like this: > > void *hashmap_get(const struct hashmap *map, const void *key) > { > struct hashmap_entry keyentry; > hashmap_entry_init(&keyentry, map->hash(key)); > return *find_entry_ptr(map, &keyentry, key); > } > > with hashmap_entry_init() purely a static helper in hashmap.c? > 1. Performance Calculating hash codes is the most expensive operation when working with hash tables (unless you already have a good hash such as sha1). And using hash tables as a cache of some sort is probably the most common use case. So you'll often have code like this: 1 unsigned int h = hash(key); 2 struct my_entry *e = hashmap_get_from_hash(&map, hash, key); 3 if (!e) { 4 e = malloc(sizeof(*e)); 5 hashmap_entry_init(e, h); 6 e->key = key; 6 hashmap_add(&map, e); 7 } Note that the hash code from line 1 can be reused in line 5. You cannot do that if calculating the hash code is buried in the implementation. Callbacks cannot be inlined either... 2. Simplicity Most APIs take a user defined hashmap_entry structure, so you'd either need two hash functions, or a hash function that can distinguish between the 'entry' and 'key-only' case, e.g. unsigned int my_hash(const struct my_entry *entry, const void *keydata) { if (keydata) return strhash(keydata); else return strhash(entry->key); } Hashmap clients will typically provide small, type safe wrappers around the hashmap API. That's perfect for calculating the hash code without breaking encapsulation. IMO there's no need to make things more complex by requiring another callback. 3. Compound keys The key doesn't always consist of just a single word. E.g. for struct dir_entry, the key is [char *name, int len]. So an API like this: void *hashmap_get(const struct hashmap *map, const void *key) won't do in the general case (unless you require clients to define their own key structure in addition to the entry structure...). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html