René Scharfe <l.s.r@xxxxxx> writes: > So I got curious if such trees might be in popular repos, wrote the patch > below and checked around a bit, but couldn't find any. > > Is there a smarter way to check for duplicates? One that doesn't need > allocations? Perhaps by having a version of tree_entry_extract() that > seeks backwards somehow? I've never looked into seeking backwards in a tree object, but in unpack-trees, I had to deal with this exact problem that a blob 'hello' sorts before 'hello.c' which in turn sorts after a tree 'hello' because of the "implicit slash after the name for an entry for a tree object" rule by introducing the "cache_bottom" hack in the traversal logic to limit how far we must scan back. We may be able to limit the list of "seen recently" names in a similar way. If the tree we are scanning has 'hello' (blob), 'hello.c' and 'hellp', the bottom pointer initially would be at 'hello' (blob), then stay there when we see 'hello.c' (because the next entry might be 'hello' (tree) that would crash with 'hello'), and when we see the entry 'hellp', we know that the entry at the bottom pointer 'hello' (blob) cannot crash with any entry that comes later in the tree object we are scanning, so we can advance the bottom pointer forward. To decide if we can advance the bottom pointer beyond 'hello.c' (blob), we see if 'hello.c' (tree) can appear after the current entry we are looking at (i.e. 'hellp'), and we know it cannot without violating the sort order. So the bottom would move to point at 'hellp' we just saw. If we had 'hello' (tree) instead of 'hellp', when we look at it after looking at 'hello' (blob) and 'hello.c', we scan from the bottom pointer up to the previous entry, which is still pointing at 'hello' (blob), and notice the crash.