On Tue, May 01 2018, Derrick Stolee wrote: > From: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > > Here is what I mean by sorting during for_each_abbrev(). This seems to work for > me, so I don't know what the issue is with this one-pass approach. > [...] > +static int sort_ambiguous(const void *a, const void *b) > +{ > + int a_type = oid_object_info(a, NULL); > + int b_type = oid_object_info(b, NULL); > + int a_type_sort; > + int b_type_sort; > + > + /* > + * Sorts by hash within the same object type, just as > + * oid_array_for_each_unique() would do. > + */ > + if (a_type == b_type) > + return oidcmp(a, b); > + > + /* > + * Between object types show tags, then commits, and finally > + * trees and blobs. > + * > + * The object_type enum is commit, tree, blob, tag, but we > + * want tag, commit, tree blob. Cleverly (perhaps too > + * cleverly) do that with modulus, since the enum assigns 1 to > + * commit, so tag becomes 0. > + */ > + a_type_sort = a_type % 4; > + b_type_sort = b_type % 4; > + return a_type_sort > b_type_sort ? 1 : -1; > +} > + > static int get_short_oid(const char *name, int len, struct object_id *oid, > unsigned flags) > { > @@ -451,6 +479,9 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data) > find_short_object_filename(&ds); > find_short_packed_object(&ds); > > + QSORT(collect.oid, collect.nr, sort_ambiguous); > + collect.sorted = 1; > + Yes this works. You're right. I wasn't trying to intentionally omit stuff in my recent 878t93zh60.fsf@xxxxxxxxxxxxxxxxxxx, I'd just written this code some days ago and forgotten why I did what I was doing (and this is hard to test for), but it's all coming back to me now. The actual requirement for oid_array_for_each_unique() working properly is that you've got to feed it in hash order, but my new sort_ambiguous() still does that (barring any SHA-1 collisions, at which point we have bigger problems), so two passes aren't needed. So yes, this apporoach works and is one-pass. But that's just an implementation detail of the current sort method, when I wrote this I was initially playing with other sort orders, e.g. sorting SHAs regardless of type by the mtime of the file I found them in. With this approach I'd start printing duplicates if I changed the internals of sort_ambiguous() like that. But I think it's extremely implausible that we'll start sorting things like that, so I'll just take this method of doing it and add some comment saying we must hashcmp() the entries in our own sort function for the de-duplication to work, I don't see us ever changing that.