Re: [PATCH v3 02/11] Add initial support for many promisor remotes

Christian Couder <christian.couder@xxxxxxxxx> · Mon, 1 Apr 2019 18:41:33 +0200

On Wed, Mar 13, 2019 at 5:09 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > +struct promisor_remote *promisor_remote_new(const char *remote_name)
> > +{
>
> Shouldn't this be static?  The config callback that calls this
> function is inside this file.

Yeah, I made it static.

> > +     struct promisor_remote *o;
> > +
> > +     o = xcalloc(1, sizeof(*o));
> > +     o->remote_name = xstrdup(remote_name);
>
> A comment on this later...
>
> > +static struct promisor_remote *promisor_remote_look_up(const char *remote_name,
> > +                                                    struct promisor_remote **previous)
>
> In our codebase, this operation is far more often called "lookup",
> one word, according to "git grep -e look_up \*.h".

Ok, I changed it to "lookup".

> > +{
> > +     struct promisor_remote *o, *p;
> > +
> > +     for (p = NULL, o = promisors; o; p = o, o = o->next)
> > +             if (o->remote_name && !strcmp(o->remote_name, remote_name)) {
> > +                     if (previous)
> > +                             *previous = p;
>
> I think the "previous" thing is for the callers to learn what
> pointer points at the found entry, allowing e.g. an element to be
> inserted just before the found element.

Actually it's to make it easy to move the found element.

> If so, would it make more
> sense to use the more familiar pattern to use
>
>         *previous = &promisors;
>
> here?

If I do that I get an "error: assignment from incompatible pointer
type" as "*previous" is of type "struct promisor_remote *" while
"&promisors" is of type "struct promisor_remote **".

Maybe you mean:

         *previous = promisors;

but I fail to see how that would correctly pass the previous element
when the found one is not the first one.

> That would remove the need to switch on NULL-ness of previous
> in the caller.

In the only caller that passes a non NULL previous, we call
promisor_remote_move_to_tail() which does:

    if (previous)
        previous->next = o->next;
    else
        promisors = o->next ? o->next : o;

So yeah we check the NULL-ness of previous, but if previous has been
set to promisors, then previous->next = o->next will not set promisors
correctly.

I guess we are not here in the case were the familiar pattern you are
thinking about can be applied. Or is there an example, maybe in the
Git source code, that I could learn from?

Another possibility is to just use hashmap as you suggest below or
list.h. It might be a bit wasteful, but the code simplification might
be worth it.

> > diff --git a/promisor-remote.h b/promisor-remote.h
> > new file mode 100644
> > index 0000000000..bfbf7c0f21
> > --- /dev/null
> > +++ b/promisor-remote.h
> > @@ -0,0 +1,17 @@
> > +#ifndef PROMISOR_REMOTE_H
> > +#define PROMISOR_REMOTE_H
> > +
> > +/*
> > + * Promisor remote linked list
> > + * Its information come from remote.XXX config entries.
> > + */
> > +struct promisor_remote {
> > +     const char *remote_name;
> > +     struct promisor_remote *next;
> > +};
>
> Would it make the management of storage easier to make it
>
>         struct promisor_remote {
>                 struct promisor_remote *next;
>                 const char name[FLEX_ARRAY];
>         };
>
> that will allow allocation with
>
>         struct promisor_remote *r;
>         FLEX_ALLOC_STR(r, name, remote_name);

Ok to use a flex array. If we ever use arrays or hashmaps of promisor
remotes, we might have to go back to not using one.

> Or if the remote_name field must be a pointer, perhaps use
> FLEXPTR_ALLOC_STR().

[...]

> Can the name of promisor be any string?  If they end up getting used
> as part of a path on the filesystem, we'd need to worry about case
> sensitivity and UTF-8 normalization issues as well.

It looks like for regular remotes we only check if they start with /.
So I don't think we need to do more than that for promisor remotes. I
added the check.

> In a large enough project where multi-promisor makes sense, what is
> the expected number of promisors a repository would define?  10s?
> 1000s?  Would a linked list still make sense when deployed in the
> real world, or would we be forced to move to something like hashmap
> later?

I am ok to use hashmap to make it similar with regular remotes.

For now I don't expect large projects to use more than 10s promisors
though. They are defined in the config file and I don't think people
will be happy if they have to manage more than 10s promisors in their
config file. If people really start to use more than that, they are
likely to ask us for a new mechanism to manage them (and to
automatically have them configured from servers). So maybe we can
change that if/when we have to work on such mechanism.

> You do not have to have the answers to all these questions, and even
> the ones with concrete answers, you do not necessarily have to act
> on them right now (e.g. you may anticipate the eventual need to move
> to hashmap, but prototyping with linked list is perfectly fine;
> being aware of the possibility alone would force us to be careful to
> make sure that the implementation detail does not leak through too
> much and confined within _lookup(), _find(), etc. functions, and
> that awareness is good enough at this point).
>
> Thanks.