On Wed, 14 Aug 2013, Jeff King wrote: > 1. Is sha1_entry_pos wrong to barf on duplicate items in the index? If > so, do we want to fix it, or simply retire GIT_USE_LOOKUP? I'd think that sha1_entry_pos should be more lenient here, especially if this doesn't compromize the overall git behavior. > Related, should we consider duplicate items in a packfile to be a > bogus packfile (and consequently notice and complain during > indexing)? I don't think it _hurts_ anything (aside from the assert > above), though it is of course wasteful. This should indeed be considered a bogus pack file. But we have a lot of code to be able to cope with bogus/corrupted pack files already. Handling this case as well would not hurt. More importantly we should make sure the code we have doesn't generate such packs. > 2. How can duplicate entries get into a packfile? > > Git itself should not generate duplicate entries (pack-objects is > careful to remove duplicates). Since these packs almost certainly > were pushed by a client, I wondered if "index-pack --fix-thin" > might accidentally add multiple copies of an object when it is the > preferred base for multiple objects, but it specifically avoids > doing so. It is probably simpler than that. An alternative pack-objects implementation could stream multiple copies of an object upon a push, and index-pack on the receiving end would simply store what's been received to disk as is without a fuss. > Given the dates on the packs and how rare this is, I'm pretty much > willing to chalk it up to a random bug (in git or otherwise) that does > not any longer exist. Possibly. Given this is not compromizing the validity of the pack, and a simple repack "fixes" it, I would not worry too much about it. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html