On Mon, Nov 13, 2023 at 05:02:54PM -0500, Jeff King wrote: > On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote: > > > This is unsafe; the object may need to be read later within the same > > merge. [...] > > > > I think (untested) that you could fix this by creating two packs > > instead of just one. In particular, add a call to > > flush_odb_transcation() after the "redo_after_renames" block in > > merge_ort_nonrecursive_internal(). > > It might not be too hard to just let in-process callers access the > objects we've written. A quick and dirty patch is below, which works > with the test you suggested (the test still fails because it finds a > conflict, but it gets past the "woah, I can't find that sha1" part). That's a very slick idea, and I think that this series has some legs to stand on now as a result. There are a few of interesting conclusions we can draw here: 1. This solves the recursive merge problem that Elijah mentioned earlier where we could generate up to 2^N packs (where N is the maximum depth of the recursive merge). 2. This also solves the case where the merge-ort code needs to read an object that it wrote earlier on in the same process without having to flush out intermediate packs. So in the best case we need only a single pack (instead of two). 3. This also solves the 'replay' issue, *and* allows us to avoid the tmp-objdir thing there entirely, since we can simulate object reads in the bulk-checkin pack. So I think that this is a direction worth pursuing. In terms of making those lookups faster, I think that what you'd want is an oidmap. The overhead is slightly unfortunate, but I think that any other solution which requires keeping the written array in sorted order would in practice be more expensive as you have to constantly reallocate and copy portions of the array around to maintain its invariant. > I don't know if that is sufficient, though. Would other spawned > processes (hooks, external merge drivers, and so on) need to be able to > see these objects, too? Interesting point. In theory those processes could ask about newly created objects, and if they were spawned before the bulk-checkin pack was flushed, those lookups would fail. But that requires that, e.g. for hooks, that we know a-priori the object ID of some newly-written objects. If you wanted to make those lookups succeed, I think there are a couple of options: - teach child-processes about the bulk-checkin pack, and let them perform the same fake lookup - flush (but do not close) the bulk-checkin transaction In any event, I think that this is a sufficiently rare and niche case that we'd be OK to declare that you should not expect the above scenarios to work when using `--write-pack`. If someone does ask for that feature in the future, we could implement it relatively painlessly using one of the above options. Thanks, Taylor