Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 01 May 2018 13:27:17 +0200

On Tue, May 01 2018, Derrick Stolee wrote:

> On 4/30/2018 6:07 PM, Ævar Arnfjörð Bjarmason wrote:
>> Since we show the commit data in the output that's nicely aligned once
>> we sort by object type. The decision to show tags before commits is
>> pretty arbitrary, but it's much less likely that we'll display a tag,
>> so if there is one it makes sense to show it first.
>
> Here's a non-arbitrary reason: the object types are ordered
> topologically (ignoring self-references):
>
> tag -> commit, tree, blob
> commit -> tree
> tree -> blob

Thanks. I'll add a patch with that comment to v2.

>> @@ -421,7 +451,12 @@ static int get_short_oid(const char *name, int len, struct object_id *oid,
>>   			ds.fn = NULL;
>>     		advise(_("The candidates are:"));
>> -		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
>> +		for_each_abbrev(ds.hex_pfx, collect_ambiguous, &collect);
>> +		QSORT(collect.oid, collect.nr, sort_ambiguous);
>
> I was wondering how the old code sorted by SHA even when the ambiguous
> objects were loaded from different sources (multiple pack-files, loose
> objects). Turns out that for_each_abbrev() does its own sort after
> collecting the SHAs and then calls the given function pointer only
> once per distinct object. This avoids multiple instances of the same
> object, which may appear multiple times across pack-files.
>
> I only ask because now we are doing two sorts. I wonder if it would be
> more elegant to provide your sorting algorithm to for_each_abbrev()
> and let it call show_ambiguous_object as before.
>
> Another question is if we should use this sort generally for all calls
> to for_each_abbrev(). The only other case I see is in
> builtin/revparse.c.

When preparing v2 I realized how confusing this was, so I'd added this
to the commit message of my WIP re-roll which should explain this:

    A note on the implementation: I started out with something much
    simpler which just replaced oid_array_sort() in sha1-array.c with a
    custom sort function before calling oid_array_for_each_unique(). But
    then dumbly noticed that it doesn't work because the output function
    was tangled up with the code added in fad6b9e590 ("for_each_abbrev:
    drop duplicate objects", 2016-09-26) to ensure we don't display
    duplicate objects.

    That's why we're doing two passes here, first we need to sort the list
    and de-duplicate the objects, then sort them in our custom order, and
    finally output them without re-sorting them. I suppose we could also
    make oid_array_for_each_unique() maintain a hashmap of emitted
    objects, but that would increase its memory profile and wouldn't be
    worth the complexity for this one-off use-case,
    oid_array_for_each_unique() is used in many other places.