On Fri, Dec 3, 2021 at 10:29 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Fri, Dec 03 2021, Han Xin wrote: > > > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > > [..] > > +static void write_stream_blob(unsigned nr, unsigned long size) > > +{ > > + char hdr[32]; > > + int hdrlen; > > + git_zstream zstream; > > + struct input_zstream_data data; > > + struct input_stream in_stream = { > > + .read = feed_input_zstream, > > + .data = &data, > > + .size = size, > > + }; > > + struct object_id *oid = &obj_list[nr].oid; > > + int ret; > > + > > + memset(&zstream, 0, sizeof(zstream)); > > + memset(&data, 0, sizeof(data)); > > + data.zstream = &zstream; > > + git_inflate_init(&zstream); > > + > > + /* Generate the header */ > > + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, type_name(OBJ_BLOB), (uintmax_t)size) + 1; > > + > > + if ((ret = write_loose_object(oid, hdr, hdrlen, &in_stream, 0, 0))) > > + die(_("failed to write object in stream %d"), ret); > > + > > + if (zstream.total_out != size || data.status != Z_STREAM_END) > > + die(_("inflate returned %d"), data.status); > > + git_inflate_end(&zstream); > > + > > + if (strict && !dry_run) { > > + struct blob *blob = lookup_blob(the_repository, oid); > > + if (blob) > > + blob->object.flags |= FLAG_WRITTEN; > > + else > > + die("invalid blob object from stream"); > > + } > > + obj_list[nr].obj = NULL; > > +} > > Just a side-note, I think (but am not 100% sure) that these existing > occurances aren't needed due to our use of CALLOC_ARRAY(): > > diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c > index 4a9466295ba..00b349412c5 100644 > --- a/builtin/unpack-objects.c > +++ b/builtin/unpack-objects.c > @@ -248,7 +248,6 @@ static void write_object(unsigned nr, enum object_type type, > die("failed to write object"); > added_object(nr, type, buf, size); > free(buf); > - obj_list[nr].obj = NULL; > } else if (type == OBJ_BLOB) { > struct blob *blob; > if (write_object_file(buf, size, type_name(type), > @@ -262,7 +261,6 @@ static void write_object(unsigned nr, enum object_type type, > blob->object.flags |= FLAG_WRITTEN; > else > die("invalid blob object"); > - obj_list[nr].obj = NULL; > } else { > struct object *obj; > int eaten; > > The reason I'm noting it is that the same seems to be true of your new > addition here. I.e. are these assignments to NULL needed? > > Anyway, the reason I started poking at this it tha this > write_stream_blob() seems to duplicate much of write_object(). AFAICT > only the writing part is really different, the part where we > lookup_blob() after, set FLAG_WRITTEN etc. is all the same. > > Why can't we call write_object() here? > > The obvious answer seems to be that the call to write_object_file() > isn't prepared to do the sort of streaming that you want, so instead > you're bypassing it and calling write_loose_object() directly. > > I haven't tried this myself, but isn't a better and cleaner approach > here to not add another meaning to what is_null_oid() means, but to just > add a HASH_STREAM flag that'll get passed down as "unsigned flags" to > write_loose_object()? See FLAG_BITS in object.h. > > Then the "obj_list[nr].obj" here could also become > "obj_list[nr].obj.flags |= (1u<<12)" or whatever (but that wouldn't > strictly be needed I think. > > But by adding the "HASH_STREAM" flag you could I think stop duplicating > the "Generate the header" etc. here and call write_object_file_flags(). > > I don't so much care about how it's done within unpack-objects.c, but > not having another meaning to is_null_oid() in play would be really > nice, and it this case it seems entirely avoidable. I did refactor it according to your suggestions in my next patch version. Using a HASH_STREAM tag is indeed a better way to deal with it, and it can also reduce my refactor to the original contents. Thanks. -Han Xin