[PATCH 1/5] implement generic key/value map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It is frequently useful to have a fast, generic data
structure mapping keys to values. We already have something
like this in the "decorate" API, but it has two downsides:

  1. The key type must always be a "struct object *".

  2. The value type is a void pointer, which means it is
     inefficient and cumbersome for storing small values.
     One must either encode their value inside the void
     pointer, or allocate additional storage for the pointer
     to point to.

This patch introduces a generic map data structure, mapping
keys of arbitrary type to values of arbitrary type.

One possible strategy for implementation is to have a struct
that points to a sequence of bytes for each of the key and
the value, and to try to treat them as opaque in the code.
However, this code gets complex, has a lot of casts, and
runs afoul of violating alignment and strict aliasing rules.

This patch takes a different approach. We parameterize the
types in each map by putting the declarations and
implementations inside macros, and expand the macros with
the correct types. This lets the compiler see the actual
code, with its real types, and figure out things like struct
packing and alignment itself.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 Documentation/technical/api-map.txt |  160 +++++++++++++++++++++++++++++++++++
 Makefile                            |    2 +
 map.c                               |   86 +++++++++++++++++++
 map.h                               |   26 ++++++
 4 files changed, 274 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/api-map.txt
 create mode 100644 map.c
 create mode 100644 map.h

diff --git a/Documentation/technical/api-map.txt b/Documentation/technical/api-map.txt
new file mode 100644
index 0000000..4153ef1
--- /dev/null
+++ b/Documentation/technical/api-map.txt
@@ -0,0 +1,160 @@
+map API
+=======
+
+The map API is a system for efficiently mapping keys to values in memory. Items
+are stored in a hash table for fast lookup; storage efficiency is achieved
+through macro-based code generation, which lets the compiler store values
+compactly in memory.
+
+Due to the code generation, there are two different facets of this API: macros
+to build new types of mappings (i.e., generate new function and struct
+definitions), and generated functions to store and retrieve values from a
+particular mapping.
+
+
+Related APIs
+------------
+
+The hash API provides a similar key/value store. However, it does not deal with
+hash collisions itself, leaving the caller to handle bucket management (but
+this is a feature if you are interested in using the collisions as part of an
+algorithm).  Furthermore, it can store only void pointers, making storage of
+small values inefficient and cumbersome.
+
+The decorate API provides a similar interface to map, but is restricted to
+using "struct object" as the key, and a void pointer as the value.
+
+
+Defining New Map Types
+----------------------
+
+A map type is uniquely defined by the pair of its key and value types. To
+define a new type, you must use the `DECLARE_MAP` macro in `map.h`, and the
+`IMPLEMENT_MAP` macro in `map.c`. Their usage is described below:
+
+`DECLARE_MAP`::
+
+	Declare a new type of map, including the struct definition and
+	declarations of access functions. The `name` parameter should describe
+	the types (e.g., `object_uint32` to map objects to 32-bit integers).
+	The `ktype` parameter specifies the C type of the key (e.g.,
+	`struct object *`) and the `vtype` parameter specifies the C type of
+	the value (e.g., `uint32_t`).
+
+`IMPLEMENT_MAP`::
+
+	Create function definitions for a map type. The `name` parameter should
+	match one given to `DECLARE_MAP`. The `equal_fun` parameter should
+	specify a function that, when given two items of type `ktype`, will
+	return a non-zero value if they are equal.  The `hash_fun` parameter
+	should specify a function that will convert an object of type `ktype`
+	into an integer hash value.
+
+Several convenience functions are provided to fill in macro parameters:
+
+`hash_obj`::
+
+	Suitable for `hash_fun` when the key type is `struct object *`.
+
+`obj_equal`::
+
+	Suitable for `equal_fun` when the key type is `struct object *`.
+
+
+Data Structures
+---------------
+
+Each defined map type will have its own structure (e.g., `map_object_uint32`).
+
+`struct map_NAME`::
+
+	A single map object. This struct should be initialized to all-zeroes.
+	The `nr` field specifies the number of items stored in the map. The
+	`size` field specifies the number of hash buckets allocated. The `hash`
+	field stores the actual data. Callers should never need to look at
+	these fields unless they are enumerating all elements of the map (see
+	the example below).
+
+`struct map_entry_NAME`::
+
+	A single entry in the hash, which may or may not contain a value. If
+	the `used` field is false, the `key` and `value` fields should not be
+	examined at all. Otherwise, the `key` and `value` fields represent a
+	single mapped pair.  You should never need to use this type directly,
+	unless you are enumerating all elements of a map.
+
+
+Functions
+---------
+
+Each defined map type will have its own set of access function (e.g.,
+`map_get_object_uint32`).
+
+`map_get_NAME(struct map_NAME *, const ktype key, vtype *value)`::
+
+	Retrieve the value corresponding to `key`, returning it via the pointer
+	`value`. Returns 1 if an item was found, zero otherwise (in which case
+	`value` is unchanged).
+
+`map_set_NAME(struct map_NAME *, const ktype key, vtype value, vtype *old)`::
+
+	Insert a mapping from `key` to `value`. If a mapping for `key` already
+	existed, the previous value is copied into `old` (if it is non-NULL)
+	and the function returns 1. Otherwise, the function returns 0.
+
+
+Examples
+--------
+
+Create a new mapping type of objects to integers:
+
+-------------------------------------------------------------------
+/* in map.h */
+DECLARE_MAP(object_int, struct object *, int)
+
+/* in map.c */
+IMPLEMENT_MAP(object_int, struct object *, int, obj_equal, hash_obj)
+-------------------------------------------------------------------
+
+Store and retrieve integers by object key:
+
+-------------------------------------------------------------------
+static struct map_object_int foos;
+
+void store_foo(const struct commit *c, int foo)
+{
+	int old;
+	if (map_set_object_uint32(&foos, &c->object, foo, &old))
+		printf("old value was %d\n", old);
+}
+
+void print_foo(const struct commit *c)
+{
+	int v;
+
+	if (map_get_object_int(&foos, &c->object, &v))
+		printf("foo: %d\n", v);
+	else
+		printf("no such foo\n");
+}
+-------------------------------------------------------------------
+
+Iterate over all map entries:
+
+-------------------------------------------------------------------
+void dump_foos(void)
+{
+	int i;
+
+	printf("there are %u foos:\n", foos.nr);
+
+	for (i = 0; i < foos.size; i++) {
+		struct map_entry_object_int *e = foos.hash[i];
+
+		if (!e->used)
+			continue;
+
+		printf("%s -> %d\n", sha1_to_hex(e->key->sha1), e->value);
+	}
+}
+-------------------------------------------------------------------
diff --git a/Makefile b/Makefile
index 4ed7996..acda5b8 100644
--- a/Makefile
+++ b/Makefile
@@ -533,6 +533,7 @@ LIB_H += list-objects.h
 LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
+LIB_H += map.h
 LIB_H += merge-file.h
 LIB_H += merge-recursive.h
 LIB_H += notes.h
@@ -624,6 +625,7 @@ LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
+LIB_OBJS += map.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
diff --git a/map.c b/map.c
new file mode 100644
index 0000000..35f300e
--- /dev/null
+++ b/map.c
@@ -0,0 +1,86 @@
+#include "cache.h"
+#include "map.h"
+#include "object.h"
+
+static unsigned int hash_obj(const struct object *obj, unsigned int n)
+{
+	unsigned int hash;
+
+	memcpy(&hash, obj->sha1, sizeof(unsigned int));
+	return hash % n;
+}
+
+static int obj_equal(const struct object *a, const struct object *b)
+{
+	return a == b;
+}
+
+#define IMPLEMENT_MAP(name, equal_fun, hash_fun) \
+static int map_insert_##name(struct map_##name *m, \
+			     const map_ktype_##name key, \
+			     map_vtype_##name value, \
+			     map_vtype_##name *old) \
+{ \
+	unsigned int j; \
+\
+	for (j = hash_fun(key, m->size); m->hash[j].used; j = (j+1) % m->size) { \
+		if (equal_fun(m->hash[j].key, key)) { \
+			if (old) \
+				*old = m->hash[j].value; \
+			m->hash[j].value = value; \
+			return 1; \
+		} \
+	} \
+\
+	m->hash[j].key = key; \
+	m->hash[j].value = value; \
+	m->hash[j].used = 1; \
+	m->nr++; \
+	return 0; \
+} \
+\
+static void map_grow_##name(struct map_##name *m) \
+{ \
+	struct map_entry_##name *old_hash = m->hash; \
+	unsigned int old_size = m->size; \
+	unsigned int i; \
+\
+	m->size = (old_size + 1000) * 3 / 2; \
+	m->hash = xcalloc(m->size, sizeof(*m->hash)); \
+	m->nr = 0; \
+\
+	for (i = 0; i < old_size; i++) { \
+		if (!old_hash[i].used) \
+			continue; \
+		map_insert_##name(m, old_hash[i].key, old_hash[i].value, NULL); \
+	} \
+	free(old_hash); \
+} \
+\
+int map_set_##name(struct map_##name *m, \
+		   const map_ktype_##name key, \
+		   map_vtype_##name value, \
+		   map_vtype_##name *old) \
+{ \
+	if (m->nr >= m->size * 2 / 3) \
+		map_grow_##name(m); \
+	return map_insert_##name(m, key, value, old); \
+} \
+\
+int map_get_##name(struct map_##name *m, \
+		   const map_ktype_##name key, \
+		   map_vtype_##name *value) \
+{ \
+	unsigned int j; \
+\
+	if (!m->size) \
+		return 0; \
+\
+	for (j = hash_fun(key, m->size); m->hash[j].used; j = (j+1) % m->size) { \
+		if (equal_fun(m->hash[j].key, key)) { \
+			*value = m->hash[j].value; \
+			return 1; \
+		} \
+	} \
+	return 0; \
+}
diff --git a/map.h b/map.h
new file mode 100644
index 0000000..cb2d4e2
--- /dev/null
+++ b/map.h
@@ -0,0 +1,26 @@
+#ifndef MAP_H
+#define MAP_H
+
+#define DECLARE_MAP(name, key_type, value_type) \
+typedef key_type map_ktype_##name; \
+typedef value_type map_vtype_##name; \
+struct map_entry_##name { \
+	map_ktype_##name key; \
+	map_vtype_##name value; \
+	unsigned used:1; \
+}; \
+\
+struct map_##name { \
+	unsigned int size, nr; \
+	struct map_entry_##name *hash; \
+}; \
+\
+extern int map_get_##name(struct map_##name *, \
+			  const map_ktype_##name key, \
+			  map_vtype_##name *value); \
+extern int map_set_##name(struct map_##name *, \
+			  const map_ktype_##name key, \
+			  map_vtype_##name value, \
+			  map_vtype_##name *old);
+
+#endif /* MAP_H */
-- 
1.7.6.34.g86521e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]