Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:
> > @@ -2108,10 +2109,19 @@ static int handle_content_merge(struct merge_options *opt,
> >                 if ((merge_status < 0) || !result_buf.ptr)
> >                         ret = error(_("failed to execute internal merge"));
> >
> > -               if (!ret &&
> > -                   write_object_file(result_buf.ptr, result_buf.size,
> > -                                     OBJ_BLOB, &result->oid))
> > -                       ret = error(_("unable to add %s to database"), path);
> > +               if (!ret) {
> > +                       ret = opt->write_pack
> > +                               ? index_blob_bulk_checkin_incore(&result->oid,
> > +                                                                result_buf.ptr,
> > +                                                                result_buf.size,
> > +                                                                path, 1)
> > +                               : write_object_file(result_buf.ptr,
> > +                                                   result_buf.size,
> > +                                                   OBJ_BLOB, &result->oid);
> > +                       if (ret)
> > +                               ret = error(_("unable to add %s to database"),
> > +                                           path);
> > +               }
> >
> >                 free(result_buf.ptr);
> >                 if (ret)
>
> This is unsafe; the object may need to be read later within the same
> merge.  Perhaps the simplest example related to your testcase is
> modifying the middle section of that testcase (I'll highlight where
> when I comment on the testcase) to read:
>
>     test_commit -C repo base file "$(test_seq 3 5)" &&
>     git -C repo branch -M main &&
>     git -C repo checkout -b side main &&
>     test_commit -C repo side-file file "$(test_seq 1 5)" &&
>     test_commit -C repo side-file2 file2 "$(test_seq 1 6)" &&
>     git -C repo checkout -b other main &&
>     test_commit -C repo other-file file "$(test_seq 3 7)" &&
>     git -C repo mv file file2 &&
>     git -C repo commit -m other-file2 &&
>
>     find repo/$packdir -type f -name "pack-*.idx" >packs.before &&
>     git -C repo merge-tree --write-pack side other &&
>
> In words, what I'm doing here is having both sides modify "file" (the
> same as you did) but also having one side rename "file"->"file2".  The
> side that doesn't rename "file" also introduces a new "file2".  ort
> needs to merge the three versions of "file" to get a new blob object,
> and then merge that with the content from the brand new "file2".  More
> complicated cases are possible, of course.  Anyway, with the modified
> testcase above, merge-tree will fail with:
>
>     fatal: unable to read blob object 06e567b11dfdafeaf7d3edcc89864149383aeab6
>
> I think (untested) that you could fix this by creating two packs
> instead of just one.  In particular, add a call to
> flush_odb_transcation() after the "redo_after_renames" block in
> merge_ort_nonrecursive_internal().  (It's probably easier to see that
> you could place the flush_odb_transaction() call inside
> detect_and_process_renames() just after the process_renames() call,
> but when redo_after_renames is true you'd end up with three packs
> instead of just two).

Great analysis, thanks for catching this error. I tested your approach,
and indeed a flush_odb_transaction() call after the process_renames()
call in detect_and_process_renames() does do the trick.

> What happens with the odb transaction stuff if no new objects are
> written before the call to flush_odb_transaction?  Will that be a
> problem?

I think that the bulk-checkin code is flexible enough to understand that
we shouldn't do anything when there aren't any objects to pack.

> (Since any tree written will not be re-read within the same merge, the
> other write_object_file() call you changed does not have the same
> problem as the one here.)
>
> >@@ -4332,9 +4349,13 @@ static int process_entries(struct merge_options *opt,
> >                fflush(stdout);
> >                BUG("dir_metadata accounting completely off; shouldn't happen");
> >        }
> >-       if (write_tree(result_oid, &dir_metadata.versions, 0,
> >+       if (write_tree(opt, result_oid, &dir_metadata.versions, 0,
> >                       opt->repo->hash_algo->rawsz) < 0)
> >                ret = -1;
> >
> > +
> > +       if (opt->write_pack)
> > +               end_odb_transaction();
> > +
>
> Is the end_odb_transaction() here going to fail with an "Unbalanced
> ODB transaction nesting" when faced with a recursive merge?

I think so, and we should have a test-case demonstrating that. In the
remaining three patches that I posted to extend this approach to 'git
replay', I moved this call elsewhere in such a way that I think

> Perhaps flushing here, and then calling end_odb_transaction() in
> merge_finalize(), much as you do in your replay-and-write-pack series,
> should actually be moved to this commit and included here?

Yep, exactly.

> This does mean that for a recursive merge, that you'll get up to 2*N
> packfiles, where N is the depth of the recursive merge.

We definitely want to avoid that ;-). I think there are a couple of
potential directions forward here, but the most promising one I think is
due to Johannes who suggests that we write loose objects into a
temporary directory with a replace_tmp_objdir() call, and then repack
that side directory before migrating a single pack back into the main
object store.

Thanks,
eaylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux