Re: [PATCH 6/6] [RFC] subvert sorted-array to replace binary-search in unpack-objects.

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 10 Dec 2010 15:00:40 -0800

Yann Dirson <ydirson@xxxxxxxxxx> writes:

> Signed-off-by: Yann Dirson <ydirson@xxxxxxxxxx>
> ---
>  builtin/unpack-objects.c |   40 +++++++++++++++++++++++++---------------
>  1 files changed, 25 insertions(+), 15 deletions(-)
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index f63973c..6d7d113 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -157,7 +158,24 @@ struct obj_info {
>  #define FLAG_OPEN (1u<<20)
>  #define FLAG_WRITTEN (1u<<21)
>  
> -static struct obj_info *obj_list;
> +/*
> + * FIXME: obj_info is a sorted array, but we read it as a whole, we
> + * don't need insertion features.  This allows us to abuse unused
> + * obj_info_nr later as a means of specifying an upper bound for
> + * binary search.  obj_info_alloc shall be eliminated by the compiler
> + * as unused.
> + */

I was scratching my head when I read "subvert" on your Subject line and
FIXME above for the first time, but after thinking about it, I think I got
it, and more importantly, I think you realized and shared with me the "too
rigid and brittle" I mentioned in my response to [1/6] earlier, if not
"overengineered" part.

As pack stream is read in, obj_list is built into an array that is sorted
by its "offset" field up to "nr"-th element.  And assigning the current
number of elements in the array to obj_list_nr is not a "kludge to bound
the search" as you said in the comment, but is the right thing to do given
the structure of your API.  "nr" is "up to this index the array is filled
and used", "alloc" is "this many is allocated", and at the point of that
assignment, "nr" is indeed what it is.

The only reason it might seem kludgy is because the API is not designed to
anticipate that there is a way to add new elements at the end by feeding
elements in the already sorted order, and that facility does so without
calling the functions your API autogenerates.

I think the most bothersome repetition with the current codebase around
binary searchable tables is the binary search loops.  Perhaps introducing
a macro that lets us write them in a more structured way, without trying
to build an elaborate top-level declarations that do everything (and
failing to do so), may give you a better payback?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html