On Fri, Dec 03 2021, Han Xin wrote: > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > [..] > +static void write_stream_blob(unsigned nr, unsigned long size) > +{ > + char hdr[32]; > + int hdrlen; > + git_zstream zstream; > + struct input_zstream_data data; > + struct input_stream in_stream = { > + .read = feed_input_zstream, > + .data = &data, > + .size = size, > + }; > + struct object_id *oid = &obj_list[nr].oid; > + int ret; > + > + memset(&zstream, 0, sizeof(zstream)); > + memset(&data, 0, sizeof(data)); > + data.zstream = &zstream; > + git_inflate_init(&zstream); > + > + /* Generate the header */ > + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, type_name(OBJ_BLOB), (uintmax_t)size) + 1; > + > + if ((ret = write_loose_object(oid, hdr, hdrlen, &in_stream, 0, 0))) > + die(_("failed to write object in stream %d"), ret); > + > + if (zstream.total_out != size || data.status != Z_STREAM_END) > + die(_("inflate returned %d"), data.status); > + git_inflate_end(&zstream); > + > + if (strict && !dry_run) { > + struct blob *blob = lookup_blob(the_repository, oid); > + if (blob) > + blob->object.flags |= FLAG_WRITTEN; > + else > + die("invalid blob object from stream"); > + } > + obj_list[nr].obj = NULL; > +} Just a side-note, I think (but am not 100% sure) that these existing occurances aren't needed due to our use of CALLOC_ARRAY(): diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 4a9466295ba..00b349412c5 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -248,7 +248,6 @@ static void write_object(unsigned nr, enum object_type type, die("failed to write object"); added_object(nr, type, buf, size); free(buf); - obj_list[nr].obj = NULL; } else if (type == OBJ_BLOB) { struct blob *blob; if (write_object_file(buf, size, type_name(type), @@ -262,7 +261,6 @@ static void write_object(unsigned nr, enum object_type type, blob->object.flags |= FLAG_WRITTEN; else die("invalid blob object"); - obj_list[nr].obj = NULL; } else { struct object *obj; int eaten; The reason I'm noting it is that the same seems to be true of your new addition here. I.e. are these assignments to NULL needed? Anyway, the reason I started poking at this it tha this write_stream_blob() seems to duplicate much of write_object(). AFAICT only the writing part is really different, the part where we lookup_blob() after, set FLAG_WRITTEN etc. is all the same. Why can't we call write_object() here? The obvious answer seems to be that the call to write_object_file() isn't prepared to do the sort of streaming that you want, so instead you're bypassing it and calling write_loose_object() directly. I haven't tried this myself, but isn't a better and cleaner approach here to not add another meaning to what is_null_oid() means, but to just add a HASH_STREAM flag that'll get passed down as "unsigned flags" to write_loose_object()? See FLAG_BITS in object.h. Then the "obj_list[nr].obj" here could also become "obj_list[nr].obj.flags |= (1u<<12)" or whatever (but that wouldn't strictly be needed I think. But by adding the "HASH_STREAM" flag you could I think stop duplicating the "Generate the header" etc. here and call write_object_file_flags(). I don't so much care about how it's done within unpack-objects.c, but not having another meaning to is_null_oid() in play would be really nice, and it this case it seems entirely avoidable.