On Fri, Dec 22, 2023 at 12:13:10AM +0100, René Scharfe wrote: > >> Should we deduplicate here, like cat-file does (i.e. use "sort -u")? > >> Having the same object in multiple places for whatever reason would not > >> be a cause for reporting an error in this test, I would think. > > > > No, for the reasons I said in the commit message: if an object exists in > > multiple places the test is already potentially invalid, as Git does not > > promise which version it will use. So it might work racily, or it might > > work for now but be fragile. By not de-duplicating, we make sure the > > test's assumption holds. > > Oh, skipped that paragraph. Still I don't see how a duplicate object > would necessarily invalidate t1006. The comment for the test "cat-file > --batch-all-objects shows all objects" a few lines above indicates that > it's picky about the provenance of objects, but it uses a separate > repository. I can't infer the same requirement for the root repo, but > we already established that I can't read. The cat-file documentation explicitly calls this situation out: Note also that multiple copies of an object may be present in the object database; in this case, it is undefined which copy’s size or delta base will be reported. So if t1006 were to grow such a duplicate object, what will happen? If we de-dup in the new test, then we might end up mentioning the same copy (and the test passes), or we might not (and the test fails). But much worse, the results might be racy (depending on how cat-file happens to decide which one to use). By no de-duping, then the test will reliably fail and the author can decide how to handle it then. IOW it is about failing immediately and predictably rather than letting a future change to sneak a race or other accident-waiting-to-happen into t1006. > Anyway, if someone finds a use for git repack without -d or > git unpack-objects or whatever else causes duplicates in the root > repository of t1006 then they can try to reverse your ban with concrete > arguments. In the real world, the most common way to get a duplicate is to fetch or push into a repository, such that: 1. There are enough objects to retain the pack (100 by default) 2. There's a thin delta in the on-the-wire pack (i.e., a delta against a base that the sender knows the receiver has, but which isn't itself sent). Then "index-pack --fix-thin" will complete the on-disk pack by storing a copy of the base object in it. And now we have it in two packs (and if it's a delta or loose in the original, the size will be different). -Peff