Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Wed, Sep 29 2021, Elijah Newren wrote:
>
> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
> > <avarab@xxxxxxxxx> wrote:
> >>
> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> >>
...
> > As per the next patch:
> >
> > int unpack_trees(..., struct unpack_trees_options *o)
> > {
> >     struct dir_struct dir = DIR_INIT;
> >     ...
> >     if (!o->preserve_ignored) {
> >         /* Setup 'dir', make o->dir point to it */
> >         ....
> >         o->dir = &dir;
> >     }
> >     ...
> >     if (o->dir)
> >         /* cleanup */
> >     ....
> > }
> >
> > The caller doesn't touch o->dir (other than initializing it to zeros);
> > unpack_trees() is wholly responsible for it.  I'd kind of like to
> > entirely remove dir from unpack_trees_options(), but I need a way of
> > passing it down through all the other functions in unpack-trees.c, and
> > leaving it in unpack_trees_options seems the easiest way to do so.  So
> > I just marked it as "for internal use only".
>
> I think I understand *how* it works, I'm puzzled by why you went for
> this whole level of indirection when you're using a struct on the stack
> in the end anyway, just ... put that in "struct unpack_trees_options"?
>
> Anyway, I see I have only myself to blame here, as you added these leak
> fixes in the v2 in response to some of my offhand comments.
>
> FWIW I then went on to do some deeper fixes not just on these leaks but
> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
> topic for a while. I suppose I only have myself to blame :)
>
> Below is a patch-on-top that I think makes this whole thing much simpler
> by doing away with the pointer entirely.
>
> I suppose this is also a partial reply to
> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@xxxxxxxxxxxxxx/;
> but I quite dislike this pattern of including a pointer like this where
> it's not needed just for the practicalities of memory management.
>
> I.e. here you use DIR_INIT. In my local patches to fix up the wider
> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
> and DIR_INIT will in turn be referenced by a
> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
> work all the way down the chain, and not need e.g. a manual
> strbuf_init(), dir_init() etc.

And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
unpack_trees_opts->dir should be initialized to NULL.

> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
> with DIR_INIT, 2021-07-01),

I might be going on a tangent here, but looking at that patch, I'm
worried that dir_init() was buggy and that you perpetuated that bug
with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
member, which neither dir_init() or DIR_INIT initialize properly
(using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
dir.c relies on either strbuf_add() calls to just happen to work with
this incorrectly initialized strbuf, or else use the strbuf_init()
call in prep_exclude() to do so, using the following snippet:

    if (!dir->basebuf.buf)
        strbuf_init(&dir->basebuf, PATH_MAX);

However, earlier in that same function we see

    if (stk->baselen <= baselen &&
        !strncmp(dir->basebuf.buf, base, stk->baselen))
            break;

So either that function can never have dir->basebuf.buf be NULL and
the strbuf_init() is dead code, or else it's possible for us to
trigger a segfault.  If it's the former, it may just be a ticking time
bomb that will transform into the latter with some other change,
because it's not at all obvious to me how dir->basebuf gets
initialized appropriately to avoid that strncmp call.  Perhaps there
is some invariant where exclude_stack is only set up by previous calls
to prep_exclude() and those won't set up exclude_stack until first
initializing basebuf.  But that really at least deserves a comment
about how we're abusing basebuf, and would probably be cleaner if we
initialized basebuf to STRBUF_INIT.

> but would probably need to bring it back, of

If you need to bring it back, it's unrelated to my changes here, and
would only be because of the lack of basebuf initialization above.

> course you need some "release()" method for the
> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
> (well, "dir_clear()" in that case), and it needs to call
> "strbuf_release()". It's just nicer if that boilerplate is all on
> destruction, but not also on struct/object setup.

The caller should *not* be initializing or tearing down
unpack_trees_options->dir beyond setting that field to NULL; it should
then leave it alone.

> We do need that setup in some cases (although a lot could just be
> replaced by lazy initialization), but if we don't....
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index a7e1712d236..de5cc6cd025 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>         static struct cache_entry *dfc;
>         struct pattern_list pl;
>         int free_pattern_list = 0;
> -       struct dir_struct dir = DIR_INIT;
>
>         if (o->reset == UNPACK_RESET_INVALID)
>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
>
>         if (len > MAX_UNPACK_TREES)
>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> -       if (o->dir)
> -               BUG("o->dir is for internal use only");

I think this was an important check that you've tossed without
replacement.  Historically, callers set up and tweaked o->dir with
various values.  With my patch, we are no longer allowing that..which
introduces a transition problem -- people might have written or are
now writing patches that make new calls of unpack_trees() previous to
this change of mine, but submit them after this change of mine gets
merged.  Without this check I added, they'd probably just do a
mechanical `o->dir->` change to `o->dir.` and assume it's good...and
then possibly have ugly bugs to hunt down.

So, I think it's helpful to have a check that provides an early
warning that tweaking o->dir is not only no longer required, but also
no longer allowed.

>         trace_performance_enter();
>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
>
>         if (!o->preserve_ignored) {
> -               o->dir = &dir;
> -               o->dir->flags |= DIR_SHOW_IGNORED;
> -               setup_standard_excludes(o->dir);
> +               o->dir.flags |= DIR_SHOW_IGNORED;
> +               setup_standard_excludes(&o->dir);
>         }
>
>         if (!core_apply_sparse_checkout || !o->update)
> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  done:
>         if (free_pattern_list)
>                 clear_pattern_list(&pl);
> -       if (o->dir) {
> -               dir_clear(o->dir);
> -               o->dir = NULL;
> -       }
> +       dir_clear(&o->dir);

Unconditionally calling dir_clear()...

>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>         trace_performance_leave("unpack_trees");
>         return ret;
> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
>
>         memset(&d, 0, sizeof(d));
> -       if (o->dir)
> -               d.exclude_per_dir = o->dir->exclude_per_dir;
> +       d.exclude_per_dir = o->dir.exclude_per_dir;
>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>         if (i)
>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
>         if (ignore_case && icase_exists(o, name, len, st))
>                 return 0;
>
> -       if (o->dir &&
> -           is_excluded(o->dir, o->src_index, name, &dtype))
> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))

Unconditionally calling is_excluded()...

>                 /*
>                  * ce->name is explicitly excluded, so it is Ok to
>                  * overwrite it.
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 71ffb7eeb0c..a8afbb20170 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -5,6 +5,7 @@
>  #include "strvec.h"
>  #include "string-list.h"
>  #include "tree-walk.h"
> +#include "dir.h"
>
>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
>
> @@ -95,7 +96,7 @@ struct unpack_trees_options {
>         struct index_state result;
>
>         struct pattern_list *pl; /* for internal use */
> -       struct dir_struct *dir; /* for internal use only */
> +       struct dir_struct dir; /* for internal use only */
>         struct checkout_metadata meta;
>  };
>

Not only did you drop the important safety check that o->dir not be
setup by the caller (which needs to be reinstated in some form), your
solution also involves unconditionally calling dir_clear() and
is_excluded().  It is not clear to me that those calls are safe...and
that they will continue to be safe in the future.  Even if it is safe
and will continue to be, I don't think this should be squashed into my
patches.  I think it should be a separate patch with its own commit
message that explicitly calls out this assumption.  Especially since
this is dir.c, which is an area where attempting to fix one very
simple little bug results in years of refactoring and fixing all kinds
of historical messes, sometimes waiting a year and a half for
responses to RFCs/review requests, and where we have to sometimes just
give up on attempting to understand the purpose of various bits of
code and instead rely on the regression tests and hope they are good
enough.  I still think that dir.c deserves a little warning at the
top, like the one I suggested in [1].

[1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@xxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux