On Tue, Sep 26, 2023 at 09:08:59PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > Hmm. I wonder if retaining some flexibility in the bulk-checkin > > mechanism may be worthwhile. We discussed at the Contributor's > > Summit[^1] today that the bulk-checkin system may be a good fit for > > packing any blobs/trees created by `merge-tree` or `replay` instead of > > writing them out as loose objects. > > But see the last paragraph of my review comments for the earlier > round upthread. This particular function implements logic that is > only applicable to blob objects, and streaming trees, commits, and > tags will need their own separate helper functions. And when they > are written, the top-level stream_to_pack() function can be > reintroduced, which will be a thin dispatcher to the four > type-specific helpers. I am not sure that I follow. If we have an address in memory from which we want to stream raw bytes directly to the packfile, that should work for all objects regardless of type, no? Having stream_to_pack() take a non-OBJ_BLOB 'type' argument would be OK provided that the file descriptor 'fd' contains the raw contents of an object which matches type 'type'. IIUC, for callers like in the ORT backend which assemble e.g. the raw bytes of a tree in its merge-ort.c::write_tree() function like so: for (i = 0; i < nr; i++) { struct merged_info *mi = versions->items[offset+i].util; struct version_info *ri = &mi->result; strbuf_addf(&buf, "%o %s%c", ri->mode, versions->items[offset+i].string, '\0'); strbuf_add(&buf, ri->oid.hash, hash_size); } we'd want some variant of stream_to_pack() that acts on a 'void *, size_t' pair rather than an 'int (fd), size_t' pair. Likely its signature would look something like: /* write raw bytes to a bulk-checkin pack */ static int write_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx *ctx, off_t *already_hashed_to, void *ptr, size_t size, enum object_type type, unsigned flags); /* write an object from memory to a bulk-checkin pack */ static int deflate_to_pack_mem(struct bulk_checkin_packfile *state, struct object_id *result_oid, void *ptr, size_t size, enum object_type type, unsigned flags); , where the above are analogous to `stream_to_pack()` and `deflate_to_pack()`, respectively. ORT would be taught to conditionally replace calls like: write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); with: deflate_to_pack_mem(&state, result_oid, buf.buf, buf.len, OBJ_TREE, HASH_WRITE_OBJECT); I guess after writing all of that out, you'd never have any callers of the existing `deflate_to_pack()` function that pass a file descriptor containing the contents of a non-blob object. So in that sense, I don't think that my proposal would change anything about this patch. But I worry that I am missing something here, so having a sanity check would be appreciated ;-). Thanks, Taylor