On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote: > > @@ -2108,10 +2109,19 @@ static int handle_content_merge(struct merge_options *opt, > > if ((merge_status < 0) || !result_buf.ptr) > > ret = error(_("failed to execute internal merge")); > > > > - if (!ret && > > - write_object_file(result_buf.ptr, result_buf.size, > > - OBJ_BLOB, &result->oid)) > > - ret = error(_("unable to add %s to database"), path); > > + if (!ret) { > > + ret = opt->write_pack > > + ? index_blob_bulk_checkin_incore(&result->oid, > > + result_buf.ptr, > > + result_buf.size, > > + path, 1) > > + : write_object_file(result_buf.ptr, > > + result_buf.size, > > + OBJ_BLOB, &result->oid); > > + if (ret) > > + ret = error(_("unable to add %s to database"), > > + path); > > + } > > > > free(result_buf.ptr); > > if (ret) > > This is unsafe; the object may need to be read later within the same > merge. Perhaps the simplest example related to your testcase is > modifying the middle section of that testcase (I'll highlight where > when I comment on the testcase) to read: > > test_commit -C repo base file "$(test_seq 3 5)" && > git -C repo branch -M main && > git -C repo checkout -b side main && > test_commit -C repo side-file file "$(test_seq 1 5)" && > test_commit -C repo side-file2 file2 "$(test_seq 1 6)" && > git -C repo checkout -b other main && > test_commit -C repo other-file file "$(test_seq 3 7)" && > git -C repo mv file file2 && > git -C repo commit -m other-file2 && > > find repo/$packdir -type f -name "pack-*.idx" >packs.before && > git -C repo merge-tree --write-pack side other && > > In words, what I'm doing here is having both sides modify "file" (the > same as you did) but also having one side rename "file"->"file2". The > side that doesn't rename "file" also introduces a new "file2". ort > needs to merge the three versions of "file" to get a new blob object, > and then merge that with the content from the brand new "file2". More > complicated cases are possible, of course. Anyway, with the modified > testcase above, merge-tree will fail with: > > fatal: unable to read blob object 06e567b11dfdafeaf7d3edcc89864149383aeab6 > > I think (untested) that you could fix this by creating two packs > instead of just one. In particular, add a call to > flush_odb_transcation() after the "redo_after_renames" block in > merge_ort_nonrecursive_internal(). (It's probably easier to see that > you could place the flush_odb_transaction() call inside > detect_and_process_renames() just after the process_renames() call, > but when redo_after_renames is true you'd end up with three packs > instead of just two). Great analysis, thanks for catching this error. I tested your approach, and indeed a flush_odb_transaction() call after the process_renames() call in detect_and_process_renames() does do the trick. > What happens with the odb transaction stuff if no new objects are > written before the call to flush_odb_transaction? Will that be a > problem? I think that the bulk-checkin code is flexible enough to understand that we shouldn't do anything when there aren't any objects to pack. > (Since any tree written will not be re-read within the same merge, the > other write_object_file() call you changed does not have the same > problem as the one here.) > > >@@ -4332,9 +4349,13 @@ static int process_entries(struct merge_options *opt, > > fflush(stdout); > > BUG("dir_metadata accounting completely off; shouldn't happen"); > > } > >- if (write_tree(result_oid, &dir_metadata.versions, 0, > >+ if (write_tree(opt, result_oid, &dir_metadata.versions, 0, > > opt->repo->hash_algo->rawsz) < 0) > > ret = -1; > > > > + > > + if (opt->write_pack) > > + end_odb_transaction(); > > + > > Is the end_odb_transaction() here going to fail with an "Unbalanced > ODB transaction nesting" when faced with a recursive merge? I think so, and we should have a test-case demonstrating that. In the remaining three patches that I posted to extend this approach to 'git replay', I moved this call elsewhere in such a way that I think > Perhaps flushing here, and then calling end_odb_transaction() in > merge_finalize(), much as you do in your replay-and-write-pack series, > should actually be moved to this commit and included here? Yep, exactly. > This does mean that for a recursive merge, that you'll get up to 2*N > packfiles, where N is the depth of the recursive merge. We definitely want to avoid that ;-). I think there are a couple of potential directions forward here, but the most promising one I think is due to Johannes who suggests that we write loose objects into a temporary directory with a replace_tmp_objdir() call, and then repack that side directory before migrating a single pack back into the main object store. Thanks, eaylor