On Sat, 07 Nov 2015 at 00:38:00, Junio C Hamano wrote: > [...] > I suspect that many of these that turn into ".have"s point the same > object as those sit at the tip of our refs (e.g. tags in alternates > we borrow from, which are the folks of the same project that copy > the same tags from the same upstream). I wonder if it easy to filter > them out? After all, if we say object X sits at refs/tags/v1.0 > there is no point showing ".have" for that same object X. > [...] It is certainly doable but we would need a better infrastructure. The easiest implementation (i.e. the one that is closest to what we currently do) is adding a second SHA1 array to collect all the hashes we already advertised. We could then pass a struct of both SHA1 arrays as second parameter to for_each_ref() and always insert the object identifier into either of the arrays in show_ref_cb(), depending on whether we immediately advertise it as a ref or decide convert it into a "have" line. To transmit "have" lines, instead of calling sha1_array_for_each_unique(), we would then call some function that prints unique entries from the first array that do not appear in the second array. Something like this (fully untested): -- 8< -- void sha1_array_for_each_diff(struct sha1_array *array1, struct sha1_array *array2, for_each_sha1_fn fn, void *data) { int i, j; if (!array1->sorted) sha1_array_sort(array1); if (!array2->sorted) sha1_array_sort(array2); for (i = j = 0; i < array1->nr; i++) { if (i > 0 && !hashcmp(array1->sha1[i], array1->sha1[i - 1])) continue; while (j < array2->nr && hashcmp(array1->sha1[i], array2->sha1[j]) > 0) j++; if (j < array2->nr && !hashcmp(array1->sha1[i], array2->sha1[j])) continue; fn(array1->sha1[i], data); } } -- >8 -- If we decide to implement all that, however, I wonder whether we should use a more sophisticated data structure instead. Currently, if there are n refs outside the current namespace pointing all to the same object, we append n entries to the SHA1 array and sort them afterwards which requires O(n) space. If we would maintain a set of hashes instead of an array in the first place, we could reduce space usage significantly by only storing what is needed (O(1) in the scenario I mentioned). We could also add markers to those object identifiers we already advertised instead of maintaining two separate lists. Unfortunately, in order to keep the running time of O(n log n), we would probably need something like self-balancing binary search trees. Alternatively, a hash map based approach could be taken. Is something like this already used anywhere in the Git source tree such that we can borrow the required data structures and functions? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html