On Sat, Apr 08, 2017 at 04:06:41PM +0200, René Scharfe wrote: > > + } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) { > > + /* implicitly borrow buf[i-2] inside tree_desc[i] */ > > + memcpy(&t[i], &t[i-2], sizeof(struct tree_desc)); > > Similar case. > > > + buf[i] = NULL; > > What makes the previous two entries special, or differently put: Why not > just check *all* previous entries? MAX_UNPACK_TREES is 8; the number of > comparisons would just about double in the worst case and stay the same for > three trees or less. The order of four trees or more wouldn't matter > anymore. If I understand correctly, we've got N (up to 8) trees, and we want to find sets of duplicates. Adjacency doesn't matter. Aren't our options basically: - look through all previous entries naively, O(N^2) - sort the list, O(N log N); but we need the original order, so we'd have to unsort at the end, too - use a hash table, O(N) but with O(N) extra storage I know N=8 isn't very big algorithmically speaking, but it would feel ugly to introduce something quadratic. Especially the limit of 8 seems rather arbitrary. In normal cases we see at most a 3-way merge. Beyond that we're in octopus territory, and 8 is way too low there (I think the octopus strategy just unpacks one at a time and barfs if there are any conflicts). I assume the rationale behind "check the last 2" is just "this optimization will kick in reliably for all sane cases", weird octopus unpack-trees be damned (though reading ca885a4fe, it sounds like 4-trees can actually happen, but I'm not clear on how). -Peff