[PATCH 1/6] Introduce sorted-array binary-search function.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We use a cpp-based template mechanism to declare the array and its
management data, as well as a search function.
Thanks to Jonathan Nieder for this design idea.

Signed-off-by: Yann Dirson <ydirson@xxxxxxxxxx>
---
 Documentation/technical/api-sorted-array.txt |  132 ++++++++++++++++++++++++++
 Makefile                                     |    1 +
 sorted-array.h                               |  128 +++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/api-sorted-array.txt
 create mode 100644 sorted-array.h

diff --git a/Documentation/technical/api-sorted-array.txt b/Documentation/technical/api-sorted-array.txt
new file mode 100644
index 0000000..b1238ad
--- /dev/null
+++ b/Documentation/technical/api-sorted-array.txt
@@ -0,0 +1,132 @@
+sorted-array API
+================
+
+The sorted-array API is meant to provide efficient binary-search
+functions into a C array of elements of arbitrary type, and insert
+functions that maintain ordering.
+
+It is meant for data structures where lookup time is much more
+important than insertion type: while lookup has o(log(n)) time
+complexity, insertion has o(n) and involves many copies.
+
+API overview
+------------
+
+The API allow to declare all variables and function as static or not,
+through the first argument to all macros, which can be `static` or
+empty.
+
+It is based on the idea of providing custom functions, which will then
+be used by generic algorithms:
+* one for comparing a search term with an array item
+* one for initializing a new array item from a search term after insertion
+
+A set of convenience wrapper macros are provided, to define a readable
+API for inserting and sorting.
+
+Note that the type of the search term can be different from the type
+of array elements (which can be more expensive to create).
+
+API reference
+-------------
+
+The macros can be categorized as follows:
+
+* the one used to define a sorted array, and its management variable
+  holding currently-allocated number of elements and number of
+  elements effectively used.  The name of those (int) management
+  variables are crafted by using the `LISTNAME` and appending
+  respectively `_alloc` and `_nr`.
++
+----
+declare_sorted_array(MAYBESTATIC,ELEMTYPE,LISTNAME)
+----
++
+eg:
++
+----
+declare_sorted_array(static, struct diff_rename_dst, rename_dst);
+----
+
+* those defining ready-for-use search and insert functions.  They
+  exist in several flavours, distinguished by their suffix, which
+  refer to different return-type semantics.
++
+----
+declare_sorted_array_search_*(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LISTNAME,CMP)
+declare_sorted_array_insert_*(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LISTNAME,CMP,INIT)
+----
++
+They all work by defining behind the scene a (static) generic search or insert
+function named with `_gen_` prepended to `FUNCNAME`.  Such generic functions
+may be useful to know about if you need to define an insert function with
+the exact same parameters.
++
+All functions of a given type (search or insert) take the same kind of
+arguments:
+
+`MAYBESTATIC`::
+	the ubiquitous `static`or empty placeholder
+`ELEMTYPE`::
+	type of element to be stored in the array
+`FUNCNAME`::
+	name of the function to be defined
+`INITTYPE`::
+	type of search term / initializer passed as argument to the
+	generated function
+`LISTNAME`::
+	name of the array in which the search takes place, usually declared
+	through `declare_sorted_array()`
+`CMP`::
+	comparison function taking an INITTYPE argument and a
+	pointer to an ELEMTYPE argument, and returning an int with
+	the same meaning as `strcmp()`.
+`GENSEARCH`::
+	(insert only) generic search function generated by
+	`declare_gen_binsearch()`
+`INIT`::
+	(insert only) element-initializer function taking as arguments
+	pointer to the ELEMTYPE to be initialized, and an INITTYPE argument
+	to take information from
+
+
+Suffix meanings are as follows:
+
+`check`::
+
+	Returns the position of the element if found pre-existing in
+	the list, or if not found, (-pos - 1) where pos is where the
+	element would have been (for search funcs) or has been (for
+	insert funcs) inserted.
+
+`checkbool`::
+
+	Returns an int, just telling whether the searched element was
+	pre-existing in the list (1) or not (0).
+
+`elem`::
+
+	Returns NULL if the search term was not found (for search
+	funcs), or a pointer to the element found or newly inserted.
+
+* those defining even-more-ready-for-use insert functions, which do
+  not need to define the generic search function at all.  Those macro
+  names are the same as the forms with an `GENSEARCH` argument, but
+  with a name ending in `_insertonly_*`, and without that `GENSEARCH`
+  argument.  The generic (static) search functions are named as
+  `_gensearch` prepended to the `FUNCNAME` argument.
+
+* those defining the generic algorithms
++
+They rarely need to be used directly, they are usually expanded
+transparently from the `declare_sorted_array_*` macros.
++
+----
+declare_gen_binsearch(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE)
+declare_gen_sorted_insert(MAYBESTATIC,ELEMTYPE,FUNCNAME,SEARCHFUNC,INITTYPE)
+----
+
+* those defining the individual wrappers
++
+They rarely need to be used directly, they are usually expanded
+transparently from the `declare_sorted_array_*` macros.
diff --git a/Makefile b/Makefile
index 7eb948d..522261e 100644
--- a/Makefile
+++ b/Makefile
@@ -544,6 +544,7 @@ LIB_H += run-command.h
 LIB_H += sha1-lookup.h
 LIB_H += sideband.h
 LIB_H += sigchain.h
+LIB_H += sorted-array.h
 LIB_H += strbuf.h
 LIB_H += string-list.h
 LIB_H += submodule.h
diff --git a/sorted-array.h b/sorted-array.h
new file mode 100644
index 0000000..added0e
--- /dev/null
+++ b/sorted-array.h
@@ -0,0 +1,128 @@
+#ifndef SORTED_ARRAY_H_
+#define SORTED_ARRAY_H_
+
+/*
+ * Sorted-array management.
+ * See Documentation/technical/api-sorted-array.txt
+ */
+
+#define declare_sorted_array(MAYBESTATIC,ELEMTYPE,LIST)			\
+MAYBESTATIC ELEMTYPE *LIST;						\
+MAYBESTATIC int LIST##_nr, LIST##_alloc;
+
+#define declare_gen_binsearch(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE)	\
+MAYBESTATIC int FUNCNAME(						\
+	ELEMTYPE *list, int list_nr,					\
+	int(*cmp_func)(INITTYPE ref, ELEMTYPE *elem),			\
+	INITTYPE data)							\
+{									\
+	int lo, hi;							\
+									\
+	lo = 0;								\
+	hi = list_nr;							\
+	while (hi > lo) {						\
+		int mid = (hi + lo) >> 1;				\
+		int cmp = cmp_func(data, list + mid);			\
+		if (!cmp)						\
+			return mid;					\
+		if (cmp < 0)						\
+			hi = mid;					\
+		else							\
+			lo = mid + 1;					\
+	}								\
+	return -lo - 1;							\
+}
+
+#define declare_gen_sorted_insert(MAYBESTATIC,ELEMTYPE,FUNCNAME,SEARCHFUNC,INITTYPE) \
+MAYBESTATIC int FUNCNAME(						\
+	ELEMTYPE **list_p, int *list_nr_p, int *list_alloc_p,		\
+	int(*cmp_func)(INITTYPE ref, ELEMTYPE *elem),			\
+	void(*init_func)(ELEMTYPE *elem, INITTYPE init),		\
+	INITTYPE data)							\
+{									\
+	int pos = SEARCHFUNC(*list_p, *list_nr_p, cmp_func, data);	\
+	if (pos >= 0) 							\
+		return pos;						\
+	/* not found */							\
+	pos = -pos - 1;							\
+	/* insert to make it at "pos" */				\
+	if (*list_alloc_p <= *list_nr_p) {				\
+		(*list_alloc_p) = alloc_nr((*list_alloc_p));		\
+		*list_p = xrealloc(*list_p,				\
+				   (*list_alloc_p) * sizeof(**list_p)); \
+	}								\
+	(*list_nr_p)++;							\
+	if (pos < *list_nr_p)						\
+		memmove(*list_p + pos + 1, *list_p + pos,		\
+			(*list_nr_p - pos - 1) * sizeof(**list_p));	\
+	init_func(&(*list_p)[pos], data);				\
+	return -pos - 1;						\
+}
+
+#define _declare_sorted_array_search_check(MAYBESTATIC,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	return GENSEARCH(LIST, LIST##_nr, CMP, data);			\
+}
+#define declare_sorted_array_search_check(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LIST,CMP) \
+  declare_gen_binsearch(MAYBESTATIC,ELEMTYPE,_gen_##FUNCNAME,INITTYPE); \
+  _declare_sorted_array_search_check(MAYBESTATIC,FUNCNAME,INITTYPE,_gen_##FUNCNAME,LIST,CMP);
+
+#define _declare_sorted_array_insert_check(MAYBESTATIC,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	return GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			 CMP, INIT, data);				\
+}
+#define declare_sorted_array_insert_check(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP,INIT) \
+  declare_gen_sorted_insert(static,ELEMTYPE,_gen_##FUNCNAME,GENSEARCH,INITTYPE); \
+  _declare_sorted_array_insert_check(MAYBESTATIC,FUNCNAME,INITTYPE,_gen_##FUNCNAME,LIST,CMP,INIT);
+#define declare_sorted_array_insertonly_check(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LIST,CMP,INIT) \
+  declare_gen_binsearch(static,ELEMTYPE,_gensearch_##FUNCNAME,INITTYPE); \
+  declare_sorted_array_insert_check(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,_gensearch_##FUNCNAME,LIST,CMP,INIT);
+
+#define _declare_sorted_array_insert_checkbool(MAYBESTATIC,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	int idx = GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			    CMP, INIT, data);				\
+	if (idx < 0)							\
+		return 0;						\
+	return 1;							\
+}
+#define declare_sorted_array_insert_checkbool(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP,INIT) \
+  declare_gen_sorted_insert(static,ELEMTYPE,_gen_##FUNCNAME,GENSEARCH,INITTYPE); \
+  _declare_sorted_array_insert_checkbool(MAYBESTATIC,FUNCNAME,INITTYPE,_gen_##FUNCNAME,LIST,CMP,INIT);
+#define declare_sorted_array_insertonly_checkbool(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LIST,CMP,INIT) \
+  declare_gen_binsearch(static,ELEMTYPE,_gensearch_##FUNCNAME,INITTYPE); \
+  declare_sorted_array_insert_checkbool(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,_gensearch_##FUNCNAME,LIST,CMP,INIT);
+
+#define _declare_sorted_array_search_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP) \
+MAYBESTATIC ELEMTYPE *FUNCNAME(INITTYPE data)				\
+{									\
+	int idx = GENSEARCH(LIST, LIST##_nr, CMP, data);		\
+	if (idx < 0)							\
+		return NULL;						\
+	return &(LIST[idx]);						\
+}
+#define declare_sorted_array_search_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LIST,CMP) \
+  declare_gen_binsearch(static,ELEMTYPE,_gen_##FUNCNAME,INITTYPE); \
+  _declare_sorted_array_search_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,_gen_##FUNCNAME,LIST,CMP);
+
+#define _declare_sorted_array_insert_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC ELEMTYPE *FUNCNAME(INITTYPE data)				\
+{									\
+	int idx = GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			    CMP, INIT, data);				\
+	if (idx < 0)							\
+		idx = -idx - 1;						\
+	return &(LIST[idx]);						\
+}
+#define declare_sorted_array_insert_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP,INIT) \
+  declare_gen_sorted_insert(static,ELEMTYPE,_gen_##FUNCNAME,GENSEARCH,INITTYPE); \
+  _declare_sorted_array_insert_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,_gen_##FUNCNAME,LIST,CMP,INIT);
+#define declare_sorted_array_insertonly_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,LIST,CMP,INIT) \
+  declare_gen_binsearch(static,ELEMTYPE,_gensearch_##FUNCNAME,INITTYPE); \
+  declare_sorted_array_insert_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,_gensearch_##FUNCNAME,LIST,CMP,INIT);
+
+#endif
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]