Re: [PATCH v4 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream

Han Xin <chiyutianyi@xxxxxxxxx> · Tue, 7 Dec 2021 14:48:51 +0800

On Fri, Dec 3, 2021 at 10:29 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
>
> On Fri, Dec 03 2021, Han Xin wrote:
>
> > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx>
> > [..]
> > +static void write_stream_blob(unsigned nr, unsigned long size)
> > +{
> > +     char hdr[32];
> > +     int hdrlen;
> > +     git_zstream zstream;
> > +     struct input_zstream_data data;
> > +     struct input_stream in_stream = {
> > +             .read = feed_input_zstream,
> > +             .data = &data,
> > +             .size = size,
> > +     };
> > +     struct object_id *oid = &obj_list[nr].oid;
> > +     int ret;
> > +
> > +     memset(&zstream, 0, sizeof(zstream));
> > +     memset(&data, 0, sizeof(data));
> > +     data.zstream = &zstream;
> > +     git_inflate_init(&zstream);
> > +
> > +     /* Generate the header */
> > +     hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, type_name(OBJ_BLOB), (uintmax_t)size) + 1;
> > +
> > +     if ((ret = write_loose_object(oid, hdr, hdrlen, &in_stream, 0, 0)))
> > +             die(_("failed to write object in stream %d"), ret);
> > +
> > +     if (zstream.total_out != size || data.status != Z_STREAM_END)
> > +             die(_("inflate returned %d"), data.status);
> > +     git_inflate_end(&zstream);
> > +
> > +     if (strict && !dry_run) {
> > +             struct blob *blob = lookup_blob(the_repository, oid);
> > +             if (blob)
> > +                     blob->object.flags |= FLAG_WRITTEN;
> > +             else
> > +                     die("invalid blob object from stream");
> > +     }
> > +     obj_list[nr].obj = NULL;
> > +}
>
> Just a side-note, I think (but am not 100% sure) that these existing
> occurances aren't needed due to our use of CALLOC_ARRAY():
>
>     diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
>     index 4a9466295ba..00b349412c5 100644
>     --- a/builtin/unpack-objects.c
>     +++ b/builtin/unpack-objects.c
>     @@ -248,7 +248,6 @@ static void write_object(unsigned nr, enum object_type type,
>                             die("failed to write object");
>                     added_object(nr, type, buf, size);
>                     free(buf);
>     -               obj_list[nr].obj = NULL;
>             } else if (type == OBJ_BLOB) {
>                     struct blob *blob;
>                     if (write_object_file(buf, size, type_name(type),
>     @@ -262,7 +261,6 @@ static void write_object(unsigned nr, enum object_type type,
>                             blob->object.flags |= FLAG_WRITTEN;
>                     else
>                             die("invalid blob object");
>     -               obj_list[nr].obj = NULL;
>             } else {
>                     struct object *obj;
>                     int eaten;
>
> The reason I'm noting it is that the same seems to be true of your new
> addition here. I.e. are these assignments to NULL needed?
>
> Anyway, the reason I started poking at this it tha this
> write_stream_blob() seems to duplicate much of write_object(). AFAICT
> only the writing part is really different, the part where we
> lookup_blob() after, set FLAG_WRITTEN etc. is all the same.
>
> Why can't we call write_object() here?
>
> The obvious answer seems to be that the call to write_object_file()
> isn't prepared to do the sort of streaming that you want, so instead
> you're bypassing it and calling write_loose_object() directly.
>
> I haven't tried this myself, but isn't a better and cleaner approach
> here to not add another meaning to what is_null_oid() means, but to just
> add a HASH_STREAM flag that'll get passed down as "unsigned flags" to
> write_loose_object()? See FLAG_BITS in object.h.
>
> Then the "obj_list[nr].obj" here could also become
> "obj_list[nr].obj.flags |= (1u<<12)" or whatever (but that wouldn't
> strictly be needed I think.
>
> But by adding the "HASH_STREAM" flag you could I think stop duplicating
> the "Generate the header" etc. here and call write_object_file_flags().
>
> I don't so much care about how it's done within unpack-objects.c, but
> not having another meaning to is_null_oid() in play would be really
> nice, and it this case it seems entirely avoidable.

I did refactor it according to your suggestions in my next patch version.
Using a HASH_STREAM tag is indeed a better way to deal with it, and it
can also reduce my refactor to the original contents.

Thanks.
-Han Xin