Re: [PATCH v3] bulk-checkin: only support blobs in index_bulk_checkin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 26, 2023 at 09:08:59PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> > Hmm. I wonder if retaining some flexibility in the bulk-checkin
> > mechanism may be worthwhile. We discussed at the Contributor's
> > Summit[^1] today that the bulk-checkin system may be a good fit for
> > packing any blobs/trees created by `merge-tree` or `replay` instead of
> > writing them out as loose objects.
>
> But see the last paragraph of my review comments for the earlier
> round upthread.  This particular function implements logic that is
> only applicable to blob objects, and streaming trees, commits, and
> tags will need their own separate helper functions.  And when they
> are written, the top-level stream_to_pack() function can be
> reintroduced, which will be a thin dispatcher to the four
> type-specific helpers.

I am not sure that I follow. If we have an address in memory from which
we want to stream raw bytes directly to the packfile, that should work
for all objects regardless of type, no?

Having stream_to_pack() take a non-OBJ_BLOB 'type' argument would be OK
provided that the file descriptor 'fd' contains the raw contents of an
object which matches type 'type'.

IIUC, for callers like in the ORT backend which assemble e.g. the raw
bytes of a tree in its merge-ort.c::write_tree() function like so:

    for (i = 0; i < nr; i++) {
        struct merged_info *mi = versions->items[offset+i].util;
        struct version_info *ri = &mi->result;

        strbuf_addf(&buf, "%o %s%c", ri->mode,
                    versions->items[offset+i].string, '\0');
        strbuf_add(&buf, ri->oid.hash, hash_size);
    }

we'd want some variant of stream_to_pack() that acts on a 'void *,
size_t' pair rather than an 'int (fd), size_t' pair. Likely its
signature would look something like:

    /* write raw bytes to a bulk-checkin pack */
    static int write_to_pack(struct bulk_checkin_packfile *state,
                             git_hash_ctx *ctx, off_t *already_hashed_to,
                             void *ptr, size_t size, enum object_type type,
                             unsigned flags);

    /* write an object from memory to a bulk-checkin pack */
    static int deflate_to_pack_mem(struct bulk_checkin_packfile *state,
                                   struct object_id *result_oid,
                                   void *ptr, size_t size,
                                   enum object_type type, unsigned flags);

, where the above are analogous to `stream_to_pack()` and
`deflate_to_pack()`, respectively. ORT would be taught to conditionally
replace calls like:

    write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid);

with:

    deflate_to_pack_mem(&state, result_oid, buf.buf, buf.len,
                        OBJ_TREE, HASH_WRITE_OBJECT);

I guess after writing all of that out, you'd never have any callers of
the existing `deflate_to_pack()` function that pass a file descriptor
containing the contents of a non-blob object. So in that sense, I don't
think that my proposal would change anything about this patch.

But I worry that I am missing something here, so having a sanity check
would be appreciated ;-).

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux