Re: [PATCH v2 00/27] revision.[ch]: add and use release_revisions()

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 24 Mar 2022 18:41:15 -0700

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> This is a re-roll of the v1 of [1] now that 7391ecd338e (Merge branch
> 'ds/partial-bundles', 2022-03-21) has landed, which it had a conflict
> with. I believe this v2 addresses all the feedback brought up on v1.

Thanks.  It was mostly a pleasant read but with a huge caveat---I
have no confidence in the code that would not try to release what it
did not allocate (simply because I do not have time to audit all
allocations to various members of rev-info structure).  But at least
if we try to free something we borrowed from say command line, we'd
immediately get a crash so with enough cooking and guinea-pig testing,
such a bug would be easy to catch.

I suspect cmd_show() still is leaky when fed a few commits.

    int cmd_show(int argc, const char **argv, const char *prefix)
    {
            struct rev_info rev;
            struct object_array_entry *objects;
            struct setup_revision_opt opt;
            struct pathspec match_all;
            int i, count, ret = 0;

            init_log_defaults();
            git_config(git_log_config, NULL);

            memset(&match_all, 0, sizeof(match_all));
            repo_init_revisions(the_repository, &rev, prefix);
            ...
            cmd_log_init(argc, argv, prefix, &rev, &opt);

            if (!rev.no_walk)
                    return cmd_log_deinit(cmd_log_walk(&rev), &rev);

At this point, we only have positive objects in pending and rev.no_walk
is set.

            count = rev.pending.nr;
            objects = rev.pending.objects;

We grab .objects and we "show" each one of them.

            for (i = 0; i < count && !ret; i++) {
                    struct object *o = objects[i].item;
                    const char *name = objects[i].name;
                    switch (o->type) {
                    ...

There are ways to show different types of objects, but what we are
interested in is what happens to commits.

                    case OBJ_COMMIT:
                            rev.pending.nr = rev.pending.alloc = 0;
                            rev.pending.objects = NULL;

We clear pending.objects because we want to reuse the rev_info we
already have here (things like "git show -U8 master maint" is harder
to do if we do not allow the reuse), and push this single commit to
the pending array, and let log_walk() machinery to "show" it (note
that we do not "walk", as we have the .no_walk bit set).

                            add_object_array(o, name, &rev.pending);
                            ret = cmd_log_walk(&rev);

After we are done with this "traversal" of a single object, we
process another element in the original "objects" list, which was
taken from the original rev.pending.objects[] array, and we can keep
going, because nobody else has access to objects[] array.

I do not offhand recall what other members of rev_info
cmd_log_walk() touches, but we are not clearing them.  Not clearing
rev.pending.objects here means we can be leaking the newly created
one-element object array to hold this single object 'o', UNLESS it
is the last object in the original objects[] array.

                            break;
                    default:
                            ret = error(_("unknown type: %d"), o->type);
                    }
            }

And even worse, we used to free objects[] here, but we no longer do.
Instead we let cmd_log_deinit() clear rev.pending.objects[], which
is the same as objects[] if and only if there are NO commit objects
on the command line.  If there is even one commit object on the
command line, we would have cleared rev.pending.objects and created
a new one, which is what is cleared by the call to cmd_log_deinit().
The original one, pointed by the local variable objects[], is no
longer freed.

            return cmd_log_deinit(ret, &rev);
    }