On Wed, Nov 13, 2024 at 10:26:56AM -0800, Jonathan Tan wrote: > > The reason these window tests are broken is that the earlier "build pack > > index for an existing pack" is now finding and storing deltas in a new > > pack when it does this: > > > > git index-pack --promisor=message test-3.pack && > > > > But that command is indexing a pack that is not even in the repository's > > object store at all! Yet it triggers a call to pack-objects that repacks > > within that object store. > > As far as I know, index-pack, when run as part of fetch, indexes a pack > that's not in the repository's object store; it indexes a packfile in a > temp directory. (So I don't think this is a strange thing to do.) When fetching (or receiving a push), we use "index-pack --stdin" and do write the resulting pack into the repository (and the command will complain if there is no repository). So I think what the test is doing above (using --promisor on a random packfile) is questionable. It goes back to 1f52cdfacb (index-pack: document and test the --promisor option, 2022-03-09). I wonder if that test is actually valuable now that the --promisor option is actually triggered via fetch. If we restricted --promisor to work only with --stdin, that would deal with both of the issues I saw. And I suspect (but didn't dig deeply or test) that would be sufficient for how it is called within git. (I wondered briefly if bundles might index in-place, but they seem to use --stdin also). Where it gets weirder to me is with quarantine directories (and maybe this is what you meant above). On receiving a push, we "index --stdin" into a temporary quarantine directory. If that kicks off a pack-objects run, where does that pack-objects put its new pack? Within the quarantined index-pack we set GIT_OBJECT_DIRECTORY to the quarantine and add the original repo as an alternate. So I _think_ both the pushed-up pack and the repacked promisor pack would go into the quarantine dir, and then we'd migrate both (or neither) when we commit to the push. Which is OK, but I don't know that I thought that far ahead when writing the quarantine stuff long ago. It's probably somewhat academic right now, as I'm not sure if you can even push reliably into a promisor repo (and it doesn't look like receive-pack knows about passing --promisor anyway). We don't quarantine on fetch right now, though we have discussed it in the past (and I think we should consider doing it). So this may become more real in the future. I wonder if there is a way to add a test to future-proof against changes to how the quarantine system works. The theoretical problem case is if we did quarantine fetches, but accidentally wrote the new promisor pack into the main repo instead of the quarantine, and then a fetch rejected the incoming pack (because of a hook, failed connectivity check, etc). Then we'd end up with the new promisor pack when we shouldn't, which I guess could move objects from that incoming pack that we rejected into the main repo, despite the quarantine? I can't think of a way to test that now, without the quarantine-on-fetch feature existing. > > I'd guess in the real world, we'd only pass that option when indexing > > packs that we just fetched. But as a bystander to this feature, it feels > > quite odd to me that index-pack, which I generally consider a "read > > only" operation except for the index it was asked to write, would be > > creating a new pack like this. I didn't follow the topic closely enough > > to comment more intelligently, but would it be possible for the caller > > of index-pack to trigger the repack as an independent step? > > I thought of that, but as far as I know, during a fetch, index-pack is > the only time in which the objects in the fetched pack are uncompressed > in memory. There have been concerns about the performance of various > ways of solving the promisor-object-and-GC bug, so I took an approach > that minimizes the performance hit as much as possible, by avoiding yet > another uncompression (we need to uncompress the objects to find their > outgoing links, so that we know what to repack). Hmm, yeah. I can see the appeal of doing the processing there. Kicking off a pack-objects can be similarly expensive in the worst case, but in practice it should only be dealing with a few new objects (and we want to cheaply find out what those objects are, if there are even any at all). > We definitely should prevent the segfault, but I think that's better > done by making --promisor only work if we run index-pack from within a > repo. I don't think we can restrict the repacking to run only if we're > indexing a pack within the repo, because in our fetch case, we're > indexing a new pack - not one within the repo. I think the "--stdin" thing above neatly solves this. > Maybe we could conceptualize "index-pack --promisor" as the pack > giving "testimony" about objects that its objects link to, so we can > update our own records. Yeah, I guess the fundamental thing here is that anybody who isn't passing "--promisor" is not going to be affected, so that at least limits the opportunity for surprise. The quarantine discussion above is an example of how there could be unexpected consequences. I _think_ it's OK based on what I wrote, but hopefully that explains my general feeling of surprise. I dunno. It still may be the least bad thing. -Peff