This series implements support for a new merge-tree option, `--write-pack`, which causes any newly-written objects to be packed together instead of being stored individually as loose. The notable change from last time is in response to a suggestion[1] from Junio to factor out an abstract bulk-checkin "source", which ended up reducing the duplication between a couple of functions in the earlier round by a significant degree. Beyond that, the changes since last time can be viewed in the range-diff below. Thanks in advance for any review! [1]: https://lore.kernel.org/git/xmqq5y34wu5f.fsf@gitster.g/ Taylor Blau (10): bulk-checkin: factor out `format_object_header_hash()` bulk-checkin: factor out `prepare_checkpoint()` bulk-checkin: factor out `truncate_checkpoint()` bulk-checkin: factor out `finalize_checkpoint()` bulk-checkin: extract abstract `bulk_checkin_source` bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types bulk-checkin: introduce `index_blob_bulk_checkin_incore()` bulk-checkin: introduce `index_tree_bulk_checkin_incore()` builtin/merge-tree.c: implement support for `--write-pack` Documentation/git-merge-tree.txt | 4 + builtin/merge-tree.c | 5 + bulk-checkin.c | 288 +++++++++++++++++++++++++------ bulk-checkin.h | 8 + merge-ort.c | 42 ++++- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++ 7 files changed, 381 insertions(+), 60 deletions(-) Range-diff against v2: 1: edf1cbafc1 = 1: 2dffa45183 bulk-checkin: factor out `format_object_header_hash()` 2: b3f89d5853 = 2: 7a10dc794a bulk-checkin: factor out `prepare_checkpoint()` 3: abe4fb0a59 = 3: 20c32d2178 bulk-checkin: factor out `truncate_checkpoint()` 4: 0b855a6eb7 ! 4: 893051d0b7 bulk-checkin: factor our `finalize_checkpoint()` @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - bulk-checkin: factor our `finalize_checkpoint()` + bulk-checkin: factor out `finalize_checkpoint()` In a similar spirit as previous commits, factor out the routine to finalize the just-written object from the bulk-checkin mechanism. -: ---------- > 5: da52ec8380 bulk-checkin: extract abstract `bulk_checkin_source` -: ---------- > 6: 4e9bac5bc1 bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` -: ---------- > 7: 04ec74e357 bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types 5: 239bf39bfb ! 8: 8667b76365 bulk-checkin: introduce `index_blob_bulk_checkin_incore()` @@ Commit message entrypoint delegates to `deflate_blob_to_pack_incore()`, which is responsible for formatting the pack header and then deflating the contents into the pack. The latter is accomplished by calling - deflate_blob_contents_to_pack_incore(), which takes advantage of the - earlier refactoring and is responsible for writing the object to the + deflate_obj_contents_to_pack_incore(), which takes advantage of the + earlier refactorings and is responsible for writing the object to the pack and handling any overage from pack.packSizeLimit. The bulk of the new functionality is implemented in the function - `stream_obj_to_pack_incore()`, which is a generic implementation for - writing objects of arbitrary type (whose contents we can fit in-core) - into a bulk-checkin pack. - - The new function shares an unfortunate degree of similarity to the - existing `stream_blob_to_pack()` function. But DRY-ing up these two - would likely be more trouble than it's worth, since the latter has to - deal with reading and writing the contents of the object. + `stream_obj_to_pack()`, which can handle streaming objects from memory + to the bulk-checkin pack as a result of the earlier refactoring. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new @@ Commit message Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## bulk-checkin.c ## -@@ bulk-checkin.c: static int already_written(struct bulk_checkin_packfile *state, struct object_id - return 0; - } - -+static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state, -+ git_hash_ctx *ctx, -+ off_t *already_hashed_to, -+ const void *buf, size_t size, -+ enum object_type type, -+ const char *path, unsigned flags) -+{ -+ git_zstream s; -+ unsigned char obuf[16384]; -+ unsigned hdrlen; -+ int status = Z_OK; -+ int write_object = (flags & HASH_WRITE_OBJECT); -+ -+ git_deflate_init(&s, pack_compression_level); -+ -+ hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); -+ s.next_out = obuf + hdrlen; -+ s.avail_out = sizeof(obuf) - hdrlen; -+ -+ if (*already_hashed_to < size) { -+ size_t hsize = size - *already_hashed_to; -+ if (hsize) { -+ the_hash_algo->update_fn(ctx, buf, hsize); -+ } -+ *already_hashed_to = size; -+ } -+ s.next_in = (void *)buf; -+ s.avail_in = size; -+ -+ while (status != Z_STREAM_END) { -+ status = git_deflate(&s, Z_FINISH); -+ if (!s.avail_out || status == Z_STREAM_END) { -+ if (write_object) { -+ size_t written = s.next_out - obuf; -+ -+ /* would we bust the size limit? */ -+ if (state->nr_written && -+ pack_size_limit_cfg && -+ pack_size_limit_cfg < state->offset + written) { -+ git_deflate_abort(&s); -+ return -1; -+ } -+ -+ hashwrite(state->f, obuf, written); -+ state->offset += written; -+ } -+ s.next_out = obuf; -+ s.avail_out = sizeof(obuf); -+ } -+ -+ switch (status) { -+ case Z_OK: -+ case Z_BUF_ERROR: -+ case Z_STREAM_END: -+ continue; -+ default: -+ die("unexpected deflate failure: %d", status); -+ } -+ } -+ git_deflate_end(&s); -+ return 0; -+} -+ - /* - * Read the contents from fd for size bytes, streaming it to the - * packfile in state while updating the hash in ctx. Signal a failure @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *state, } } @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st +{ + struct pack_idx_entry *idx = NULL; + off_t already_hashed_to = 0; ++ struct bulk_checkin_source source = { ++ .type = SOURCE_INCORE, ++ .buf = buf, ++ .size = size, ++ .read = 0, ++ .path = path, ++ }; + + /* Note: idx is non-NULL when we are writing */ + if (flags & HASH_WRITE_OBJECT) @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st + + while (1) { + prepare_checkpoint(state, checkpoint, idx, flags); -+ if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to, -+ buf, size, type, path, flags)) ++ ++ if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source, ++ type, flags)) + break; + truncate_checkpoint(state, checkpoint, idx); ++ bulk_checkin_source_seek_to(&source, 0); + } + + finalize_checkpoint(state, ctx, checkpoint, idx, result_oid); 6: 57613807d8 = 9: cba043ef14 bulk-checkin: introduce `index_tree_bulk_checkin_incore()` 7: f21400f56c = 10: ae70508037 builtin/merge-tree.c: implement support for `--write-pack` -- 2.42.0.408.g97fac66ae4