Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric Sunshine <sunshine@xxxxxxxxxxxxxx> 于2021年4月6日周二 上午1:05写道:
>
> On Mon, Apr 5, 2021 at 10:01 AM ZheNing Hu via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> > [...]
> > Signed-off-by: ZheNing Hu <adlternative@xxxxxxxxx>
> > ---
>
> Was there a discussion leading up to this change? If so, it may be a
> good idea to provide a link to it in the mailing list here under the
> "---" line.
>

Okay, I will add them in cover-letter next time.

> Some comments below...
>
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >         struct ref_format format = REF_FORMAT_INIT;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < maxcount; i++)
> > -               show_ref_array_item(array.items[i], &format);
> > +               show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         return 0;
> >  }
>
> The call to strbuf_reset() within show_ref_array_item() does not free
> the memory from the strbufs, so the memory is being leaked. Therefore,
> at the end of this function, you should have:
>
>     strbuf_release(final_buf);
>     strbuf_release(error_buf);
>

Thanks, I ignored this point.

> > diff --git a/builtin/tag.c b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >         struct ref_array array;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < array.nr; i++)
> > -               show_ref_array_item(array.items[i], format);
> > +               show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         free(to_free);
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> >  void show_ref_array_item(struct ref_array_item *info,
> > -                        const struct ref_format *format)
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf)
> >  {
> > -       struct strbuf final_buf = STRBUF_INIT;
> > -       struct strbuf error_buf = STRBUF_INIT;
> >
> > -       if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > -               die("%s", error_buf.buf);
> > -       fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > -       strbuf_release(&error_buf);
> > -       strbuf_release(&final_buf);
> > +       if (format_ref_array_item(info, format, final_buf, error_buf))
> > +               die("%s", error_buf->buf);
> > +       fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > +       strbuf_reset(error_buf);
> > +       strbuf_reset(final_buf);
> >         putchar('\n');
> >  }
>
> A couple comments:
>
> It is especially ugly that `error_buf` needs to be passed in by the
> caller since it is only ever used in case of an error, at which point
> the program will die() anyhow, so it's not on a critical,
> speed-sensitive path. The initialization with STRBUF_INIT is
> practically cost-free, so this variable could easily stay local to
> this function without cost-penalty rather than forcing the caller to
> pass it in. (This assumes that none of the consumers of `error_buf`
> down the line insert into the buffer unnecessarily, which is probably
> a reasonable assumption.)
>

What you said makes sense. The `error_buf` may not need to be passed
as a parameter, because errors are generally less.

> It is an unsafe assumption to only call strbuf_reset() at the end of
> the function. For this to be robust, you can't assume that the caller
> has given you an empty strbuf. Instead, you must ensure it by calling
> strbuf_reset() at the start. (It doesn't hurt to also call
> strbuf_reset() at the end, but that is not critical to correct
> operation, so could be omitted.)
>

Well, indeed, it would be better to use `strbuf_reset()` first, as it do in
`cat-file.c`.

> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> >         struct ref_array_item *ref_item;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> > +
> >         ref_item = new_ref_array_item(name, oid);
> >         ref_item->kind = ref_kind_from_refname(name);
> > -       show_ref_array_item(ref_item, format);
> > +       show_ref_array_item(ref_item, format, &final_buf, &error_buf);
> >         free_array_item(ref_item);
> >  }
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.h b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> >  /*  Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf);
>
> It is not clear to the person reading this what these two new
> arguments are for or what should be provided, so the comment above the
> function deserves an update explaining what these arguments are for
> and how to provide them. This is especially important since this is a
> public function.
>
> If this function was merely internal to some builtin command, this
> sort of change for the sake of optimization might not be so bad, but
> as a public function, it comes across as rather ugly. In general, we
> don't necessarily want to pollute an otherwise clean API with changes
> which make the API uglier merely for the sake of tiny optimizations
> like this (IMHO). The extra burden placed on callers by this change,
> coupled with the ugliness this introduces into the API, makes the
> change seem less than desirable.
>

Well, for the time being, there are relatively few places
where`show_ref_array_item()`
is used in the entire git repository. I may need to pay attention to this later:
the ease of use of public interfaces is also important.

> One way you might be able to mitigate the undesirableness would be to
> have two versions of this function. For instance:
>
>     /* Print the ref using the given format and quote_style */
>     show_ref_array_item(...);
>    /* This is like show_ref_array_item() but ... */
>     show_ref_array_item_optim(...);
>
> The comment of the second function would, of course, need to explain
> that it is similar to show_ref_array_item() but more optimal because
> <some reasons> and that the caller is responsible for allocating and
> releasing the strbufs, and that the strbufs are used only for
> temporary storage, thus the caller should not assume anything about
> them.
>

Yes, this will ensure that this new public interface will not be misused.

> This way, callers which don't invoke show_ref_array_item() in a tight
> loop don't need to be burdened by the new arguments (and won't have to
> remember to release them), and callers with a tight loop can take
> advantage of the optimization with the bit of extra work of having to
> declare and release the strbufs.
>

For external calls, reasonable release of strbuf does require attention,
which is indeed a disadvantage at some time.

> So, having said all that, it's not clear that the ugliness and extra
> work are worth the gain. However, if it is decided that the change is
> worthwhile, then the commit message probably should explain cases in
> which such an optimization will be really beneficial.

I now estimate that the optimization brought here may appear in a more refs
git repo like `linux.git`. I have to experiment first.

Thanks, Eric.

--
ZheNing Hu




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux