On Fri, Dec 08, 2023 at 05:40:29PM -0800, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > Hopefully you're satisfied with the way things are split up and > > organized currently, but if you have suggestions on other ways I could > > slice or dice this, please let me know. > > I did wonder how expensive to recompute and validate the "distinct" > information (in other words, is it too expensive for the consumers > of an existing midx file to see which packs are distinct on demand > before they stream contents out of the underlying packs?), as the > way the packs are marked as distinct looked rather error prone (you > can very easily list packfiles with overlaps with "+" prefix and the > DISK chunk writer does not even notice that you lied to it). As long > as "git fsck" catches when two packs that are marked as distinct share > an object, that is OK, but the arrangement did look rather brittle > to me. It's likely too expensive to do on the reading side for every pack-objects operation or MIDX load. But we do check this property when we write the MIDX, see these lines from midx.c::get_sorted_entries(): for (cur_object = 0; cur_object < fanout.nr; cur_object++) { struct pack_midx_entry *ours = &fanout.entries[cur_object]; if (cur_object) { struct pack_midx_entry *prev = &fanout.entries[cur_object - 1]; if (oideq(&prev->oid, &ours->oid)) { if (prev->disjoint && ours->disjoint) die(_("duplicate object '%s' among disjoint packs '%s', '%s'"), oid_to_hex(&prev->oid), info[prev->pack_int_id].pack_name, info[ours->pack_int_id].pack_name); continue; } } This series doesn't yet have a corresponding step in the fsck builtin, but I will investigate adding one. I'm happy to include it in a subsequent round here, but I worry that this series is already on the verge of being too complex as-is, so it may be nice as a follow-up, too. Thanks, Taylor