On Thu, Mar 24 2022, Neeraj Singh via GitGitGadget wrote: > From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> > > Make it clearer in the naming and documentation of the plug_bulk_checkin > and unplug_bulk_checkin APIs that they can be thought of as > a "transaction" to optimize operations on the object database. > > Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> > --- > builtin/add.c | 4 ++-- > bulk-checkin.c | 4 ++-- > bulk-checkin.h | 14 ++++++++++++-- > 3 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/builtin/add.c b/builtin/add.c > index 3ffb86a4338..9bf37ceae8e 100644 > --- a/builtin/add.c > +++ b/builtin/add.c > @@ -670,7 +670,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) > string_list_clear(&only_match_skip_worktree, 0); > } > > - plug_bulk_checkin(); > + begin_odb_transaction(); > > if (add_renormalize) > exit_status |= renormalize_tracked_files(&pathspec, flags); > @@ -682,7 +682,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) > > if (chmod_arg && pathspec.nr) > exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); > - unplug_bulk_checkin(); > + end_odb_transaction(); Aside from anything else we've (dis)agreed on, I found this part really odd when hacking on my RFC-on-top, i.e. originally I (wrongly) thought the plug_bulk_checkin() was something that originated with this series which adds the "bulk" mode. But no, on second inspection it's a thing Junio added a long time ago so that in this case we "stream to N pack" where we'd otherwise add N loose objects. Which, and I think Junio brought this up in an earlier round, but I didn't fully understand that at the time makes this whole thing quite odd to me. So first, shouldn't we add this begin_odb_transaction() as a new thing? I.e. surely wanting to do that object target redirection within a given begin/end "scope" should be orthagonal to how fsync() happens within that "scope", though in this case that happens to correspond. And secondly, per the commit message and comment when it was added in (568508e7657 (bulk-checkin: replace fast-import based implementation, 2011-10-28)) is it something we need *for that purpose* with the series to unpack-objects without malloc()ing the size of the blob[1]. And, if so and orthagonal to that: If we know how to either stream N objects to a PACK (as fast-import does), *and* we now (or SOON) know how to stream loose objects without using size(blob) amounts of memory, doesn't the "optimize fsync()" rather want to make use of the stream-to-pack approach? I.e. have you tried for the caseses where we create say 1k objects for "git stash" tried to stream those to a pack? How does that compare (both with/without the fsync changes). I.e. I do worry (also per [2]) that while the whole "bulk fsync" is neat (and I think can use it in either case, to defer object syncs until the "index" or "ref" sync, as my RFC does) I worry that we're adding a bunch of configuration and complexity for something that: 1. Ultimately isn't all that important, as already for part of it we can mostly configure it away. I.e. "git-unpack-objects" v.s. writing a pack, cf. transfer.unpackLimit) 2. We don't have #1 for "add" and "update-index", but if we stream to packs there is there any remaining benefit in practice? 1. https://lore.kernel.org/git/cover-v11-0.8-00000000000-20220319T001411Z-avarab@xxxxxxxxx/ 2. https://lore.kernel.org/git/220323.86fsn8ohg8.gmgdl@xxxxxxxxxxxxxxxxxxx/