"Eric W. Biederman" <ebiederm@xxxxxxxxx> writes: > Linus Arver <linusa@xxxxxxxxxx> writes: > >> "Eric W. Biederman" <ebiederm@xxxxxxxxx> writes: >> >>> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> >>> While looking at how to handle input of both SHA-1 and SHA-256 oids in >>> get_oid_with_context, I realized that the oid_array in >>> repo_for_each_abbrev might have more than one kind of oid stored in it >>> simultaneously. >>> >>> Update to oid_array_append to ensure that oids added to an oid array >> >> s/Update to/Update >> >>> always have an algorithm set. >>> >>> Update void_hashcmp to first verify two oids use the same hash algorithm >>> before comparing them to each other. >>> >>> With that oid-array should be safe to use with different kinds of >> >> s/oid-array/oid_array >> >>> oids simultaneously. >>> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> --- >>> oid-array.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/oid-array.c b/oid-array.c >>> index 8e4717746c31..1f36651754ed 100644 >>> --- a/oid-array.c >>> +++ b/oid-array.c >>> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid) >>> { >>> ALLOC_GROW(array->oid, array->nr + 1, array->alloc); >>> oidcpy(&array->oid[array->nr++], oid); >>> + if (!oid->algo) >>> + oid_set_algo(&array->oid[array->nr - 1], the_hash_algo); >> >> How come we can't set oid->algo _before_ we call oidcpy()? It seems odd >> that we do the copy first and then modify what we just copied after the >> fact, instead of making sure that the thing we want to copy is correct >> before doing the copy. >> >> But also, if we are going to make the oid object "correct" before >> invoking oidcpy(), we might as well do it when the oid is first >> created/used (in the caller(s) of this function). I don't demand that >> you find/demonstrate where all these places are in this series (maybe >> that's a hairy problem to tackle?), but it seems cleaner in principle to >> fix the creation of oid objects instead of having to make oid users >> clean up their act like this after using them. > > There is a hairy problem here. > > I believe for reasons of simplicity when the algo field was added to > struct object_id it was allowed to be zero for users that don't > particularly care about the hash algorithm, and are happy to use the git > default hash algorithm. > > Me experience working on this set of change set showed that there > are oids without their algo set in all kinds of places in the tree. Ah, I see. Thanks for the clarification. > I could not think of any sure way to go through the entire tree > and find those users, so I just made certain that oid array handled > that case. > > I need algo to be set properly in the oids in the oid array so I > could extend oid_array to hold multiple kinds of oids at the same > time. To allow multiple kinds of oids at the same time void_hashcmp > needs a simple and reliable way to tell what the algorithm is of > any given oid. Makes sense. >> >>> array->sorted = 0; >>> } >>> >>> -static int void_hashcmp(const void *a, const void *b) >>> +static int void_hashcmp(const void *va, const void *vb) >>> { >>> - return oidcmp(a, b); >>> + const struct object_id *a = va, *b = vb; >>> + int ret; >>> + if (a->algo == b->algo) >>> + ret = oidcmp(a, b); >> >> This makes sense (per the commit message description) ... >> >>> + else >>> + ret = a->algo > b->algo ? 1 : -1; >> >> ... but this seems to go against it? I thought you wanted to only ever >> compare hashes if they were of the same algo? It would be good to add a >> comment explaining why this is OK (we are no longer doing a byte-by-byte >> comparison of these oids any more here like we do for oidcmp() above >> which boils down to calling memcmp()). > > So the goal of this change is for oid_array to be able to hold hashes > from multiple algorithms at the same time. > > A key part of oid_array is oid_array_sort that allows functions such > as oid_array_lookup and oid_array_for_each_unique. > > To that end there needs to be a total ordering of oids. > > The function oidcmp is only defined when two oids are of the same > algorithm, it does not even test to detect the case of comparing > mismatched algorithms. > > Therefore to get a total ordering of oids. I must use oidcmp > when the algorithm is the same (the common case) or simply order > the oids by algorithm when the algorithms are different. > > > > All of this is relevant to get_oid_with_context as get_oid_with_context > and it's helper functions contain the logic that determines what > we do when a hex string that is ambiguous is specified. > > In the ambiguous case all of the possible candidates are placed in > an oid_array, sorted and then displayed. > > > With a repository that can knows both the sha1 and the sha256 oid > of it's objects it is possible for a short oid to match both > some sha1 oids and some sha256 oids. Thanks for the additional clarification. I think a lot of this could have been added as comments or perhaps in the commit message. The "short id can match both sha1 or sha256" is a very real scenario we need to consider in the sha1+sha256 world, indeed. >>> + return ret; >> >> Also, in terms of style I think the "early return for errors" style >> would be simpler to read. I.e. >> >> if (a->algo > b->algo) >> return 1; >> >> if (a->algo < b->algo) >> return -1; >> >> return oidcmd(a, b); >> > > I can see doing: > if (a->algo == b->algo) > return oidcmp(a,b); > > if (a->algo > b->algo) > return 1; > else > return -1; > > Or even: > if (a->algo == b->algo) > return oidcmp(a,b); > > return a->algo - b->algo; > > Although I suspect using subtraction is a bit too clever. Agreed. > Comparing for less than, and greater than, and then assuming > the values are equal hides what is important before calling > oidcmp which is that the algo values are equal. I would still prefer the "early return for errors" style even in this case. This is because I much prefer to have the question "how can things go wrong?" answered first, and dealt with, such that as I read top-to-bottom I am left with less things I have to consider to understand the "happy path". WRT emphasizing the "algos equal each other" concern, a simple comment like /* Only compare equal algorithms. */ return oidcmp(a, b); seems sufficient. But, of course it is possible (perhaps even likely) that my preferred style is in the minority. Up to you. Thanks.