Stefan Beller wrote: > Subject: hashmap: migrate documentation from Documentation/technical into header > > While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well > as documenting the new data pointer for the compare function. > > Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> > --- > Documentation/technical/api-hashmap.txt | 309 -------------------------------- > hashmap.h | 249 +++++++++++++++++++++++-- > 2 files changed, 231 insertions(+), 327 deletions(-) > delete mode 100644 Documentation/technical/api-hashmap.txt Yay, I love this. [...] > --- a/Documentation/technical/api-hashmap.txt > +++ /dev/null > @@ -1,309 +0,0 @@ Verified that these docs have all been migrated to the header, except where noted below. [...] > -`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`:: > - > - User-supplied function to test two hashmap entries for equality. Shall > - return 0 if the entries are equal. > -+ > -This function is always called with non-NULL `entry` / `entry_or_key` > -parameters that have the same hash code. When looking up an entry, the `key` > -and `keydata` parameters to hashmap_get and hashmap_remove are always passed > -as second and third argument, respectively. Otherwise, `keydata` is NULL. This was really heard to read in the preimage. Thanks for cleaning it up. [...] > -Usage example > -------------- > - > -Here's a simple usage example that maps long keys to double values. What happened to this? I think it would be useful to include in a long comment at the top of the header. A worked example like this makes it easier to get one's bearings and know which functions to look at. [...] > -Using variable-sized keys > -------------------------- Same question: should this discussion go in the hashmap_get() docstring, with a pointer from hashmap_remove()? Or should it go in test-hashmap.c, with a pointer in the docstrings of both? [...] > +++ b/hashmap.h > @@ -3,11 +3,18 @@ > > /* > * Generic implementation of hash-based key-value mappings. > - * See Documentation/technical/api-hashmap.txt. > + * > */ nit: why the blank line? > > -/* FNV-1 functions */ > - > +/* > + * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see > + * http://www.isthe.com/chongo/tech/comp/fnv). > + * `strhash` and `strihash` take 0-terminated strings, while `memhash` and > + * `memihash` operate on arbitrary-length memory. > + * `strihash` and `memihash` are case insensitive versions. > + * `memihash_cont` is a variant of `memihash` that allows a computation to be > + * continued with another chunk of data. > + */ > extern unsigned int strhash(const char *buf); > extern unsigned int strihash(const char *buf); > extern unsigned int memhash(const void *buf, size_t len); > @@ -16,6 +23,15 @@ extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_ > > static inline unsigned int sha1hash(const unsigned char *sha1) > { > + /* > + * Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code > + * for use in hash tables. Cryptographic hashes are supposed to have > + * uniform distribution, so in contrast to `memhash()`, this just copies > + * the first `sizeof(int)` bytes without shuffling any bits. Note that > + * the results will be different on big-endian and little-endian > + * platforms, so they should not be stored or transferred over the net. > + */ This comment should go in front of the function instead of in its body, like this: /* * Converts a crypto [...] */ static inline unsigned int sha1hash(... because it is describing the purpose and usage of the function and not its implementation. [...] > +/* > + * User-supplied function to test two hashmap entries for equality. Shall > + * return 0 if the entries are equal. > + * > + * This function is always called with non-NULL `entry` and `entry_or_key` > + * parameters that have the same hash code. > + * > + * When looking up an entry, the `key` and `keydata` parameters to hashmap_get > + * and hashmap_remove are always passed > + * as second `entry_or_key` and third argument `keydata`, respectively. > + * Otherwise, `keydata` is NULL. strange wrapping > + * > + * There are two modes for this function to be used: > + * (a) When looking for an exact match of a given `entry`, then `keydata` > + * ought to be NULL, and this function should cast `entry` and > + * `entry_or_key` to the user entries and check for equality. What does it mean that `keydata` ought to be NULL? Are you saying it will be NULL, or that someone has to ensure it's NULL, or something else? > + * (b) When it is too expensive to allocate such a user entry (either because > + * it is large or varialbe sized, such that it is not on the stack), > + * Then the relevant data to check for equality should be passed via > + * ` keydata`. stray space in "` keydata`". > + * > + * Resulting from these modes, this function should compare `keydata` to `entry` > + * when `keydata` is not NULL. `entry_or_key` may be a bogus hashmap_entry (see > + * hashmap_get_from_hash). What is meant by a bogus hashmap_entry here? (See also below.) [...] > +/* > + * struct hashmap is the hash table structure. Members can be used as follows, > + * but should not be modified directly. > + */ > struct hashmap { [...] > + /* Prevent automatic rehashes during inserts and deletes when set. */ > + unsigned disallow_rehash : 1; > }; Please add a reference to hashmap_disallow_rehash so people know how to set this. > > /* hashmap functions */ > > +/* > + * Initializes a hashmap structure. > + * > + * `map` is the hashmap to initialize. > + * > + * The `equals_function` can be specified to compare two entries for equality. > + * If NULL, entries are considered equal if their hash codes are equal. > + * > + * When the compare function needs specific user supplied `data`, it should nit: it's not user supplied so much as caller supplied. Maybe this can say something like The `data` parameter can be used to provide additional data (a callback cookie) that will be passed to `equals_function` each time it is called. This allows a single `equals_function` to implement multiple comparison functions. [...] > +/* > + * Frees a hashmap structure and allocated memory. > + * > + * If `free_entries` is true, each hashmap_entry in the map is freed as well. > + */ nit: Documentation/technical said the hashmap_entrys get freed using free(). Can't hurt to carry that note over, to save the reader from hunting down whether the struct hashmap allows setting a custom free function. [...] > +/* > + * Returns the hashmap entry for the specified key, or NULL if not found. > + * > + * `map` is the hashmap structure. > + * `key` is a user data structure that starts with hashmap_entry that has at > + * least been initialized with the proper hash code (via `hashmap_entry_init`). nit: strange alignment (extra space before "least"). Should these argument lists be formatted as a list somehow (e.g. leading "- " bullets, or with a blank line between them? (I don't have a strong opinion, just noticed they can be hard to read with everything run together.) > + * `keydata` is a data structure that holds just enough information to check > + * for equality to a given entry. > + * > + * One of `key` or `keydata` shall be NULL. > + * > + * If an entry with matching hash code is found, `key` and `keydata` are passed > + * to `hashmap_cmp_fn` to decide whether the entry matches the key. > + */ > extern void *hashmap_get(const struct hashmap *map, const void *key, [...] > + * Removes a hashmap entry matching the specified key. If the hashmap > + * contains duplicate entries equal to the specified key, only one of > + * them will be removed. > + * > + * `map` is the hashmap structure. > + * `key` is a user data structure that starts with hashmap_entry that has at > + * least been initialized with the proper hash code (via `hashmap_entry_init`). nit: strange alignment again [...] > extern void *hashmap_remove(struct hashmap *map, const void *key, [...] > +/* > + * Returns the hashmap entry for the specified hash code and key data, > + * or NULL if not found. > + * > + * `map` is the hashmap structure. > + * `hash` is the hash code of the entry to look up. > + * > + * If an entry with matching hash code is found, `keydata` is passed to > + * `hashmap_cmp_fn` to decide whether the entry matches the key. The > + * `entry_or_key` parameter points to a bogus hashmap_entry structure that > + * should not be used in the comparison. What is going on with this note about bogus entry_or_key? I have no idea what it is trying to say, neither in the preimage nor the postimage. > + */ > static inline void *hashmap_get_from_hash(const struct hashmap *map, > - unsigned int hash, const void *keydata) > + unsigned int hash, > + const void *keydata) In Documentation/technical, hashmap_get_from_hash was immediately after hashmap_get, making it more discoverable. Should the descriptions of these two functions mention each other to bring about a similar effect? [...] > @@ -91,7 +273,7 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash); > * manner appropriate to their usage. This simply > * prevents the table from being unexpectedly re-mapped. > * > - * If is up to the caller to ensure that the hashmap is > + * It is up to the caller to ensure that the hashmap is > * initialized to a reasonable size to prevent poor > * performance. > * The Documentation/technical/ text says: A call to allow rehashing does not force a rehash; that might happen with the next insert or delete. which I find easier on the eyes than * When value=0, allow future rehahses. This DOES NOT force * a rehash now. Why not make this comment use the full description from Documentation/technical/ verbatim? > @@ -104,10 +286,28 @@ static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value) [...] > -/* hashmap_iter functions */ > > +/* > + * hashmap_iter functions > + * > + * Used to iterate over all entries of a hashmap. Note that it is > + * not safe to add or remove entries to the hashmap while > + * iterating. > + */ > + > +struct hashmap_iter { nit: might be simpler to associate this comment with the struct: /* * Used to iterate over ... */ struct hashmap_iter { ... } That way, the scope of what the comment is describing is clearer. When looking up the functions, I'm likely to look up the hashmap_iter type they use, so I'd end up reading this comment anyway. > + struct hashmap *map; > + struct hashmap_entry *next; > + unsigned int tablepos; > +}; > + > +/* Initializes a `hashmap_iter` structure. */ > extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter); > + > +/* Returns the next hashmap_entry, or NULL if there are no more entries. */ > extern void *hashmap_iter_next(struct hashmap_iter *iter); > + > +/* Initializes the iterator and returns the first entry, if any). */ Stray paren? Probably worth mentioning this is a convenience wrapper for iter_init + iter_next, like the Documentation/technical/ text did. > static inline void *hashmap_iter_first(struct hashmap *map, The rest looks good. Thanks and hope that helps, Jonathan