Re: [PATCH v2 6/8] dir: fix checks on common prefix directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 16, 2019 at 10:13 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Elijah Newren <newren@xxxxxxxxx> writes:
>
> >> > > +     memset(&cdir, 0, sizeof(cdir));
> >> > > +     memset(&de, 0, sizeof(de));
> >> > > +     cdir.de = &de;
> >> > > +     de.d_type = DT_DIR;
> >> >
> >> > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`.
> >>
> >> Um, yeah...didn't I have an allocation of de.d_name here?  It will
> >> always have a subset of path copied into it, so an allocation of len+1
> >> is plenty long enough.
> >
> > Actually, it looks like I looked up the definition of dirent
> > previously and forgot by the time you emailed.  On linux, from
> > /usr/include/bits/dirent.h:
> >
> > struct dirent
> >   {
> >     ....
> >     unsigned char d_type;
> >     char d_name[256];           /* We must not include limits.h! */
> >   };
> >
> > ...
>
> Uh, oh.  The size of "struct dirent" is unspecified and it is asking
> for trouble to allocate one yourself (iow, treat it pretty much as
> something you can only get a pointer to an instance from readdir()).
> For example, a dirent that comes back readdir() may have a lot
> longer name than the sizeof(.d_name[]) above may imply.
>
> Do you really need to manufacture a dirent yourself, or can you use
> a more concrete type you invent yourself?

I need to manufacture a dirent myself; short of that, the most likely
alternative is to drop patches 2 & 5-8 of this series and throw my
hands in the air and give up.  That probably deserves an
explanation...

Years ago someone noticed that if a user ran "git ls-files -o
foo/bar/one foo/bar/two", that we could try to optimize by noticing
that we won't be interested in anything until we get to foo/bar/.  So,
they tried to short-circuit the read_directory_recursive() and
readdir() calls, but couldn't reuse the same treat_path() logic to
check that we should even go into foo/bar/ at all.  So there was some
copy & paste from treat_path() into a new treat_leading_path()...and
that both missed some important parts and the logic further diverged
over time.

This patch was about categorizing the suite of bugs that arose from
not using treat_path() for checks from both codepaths, and tried to
correct those problems.  treat_path() takes a dirent, and several of
the functions it calls all take a dirent.  It'd be an awful lot of
work to rip it out.  So I manufactured a dirent myself so that we
could use the same codepaths and not only fix all these bugs but
prevent future ones as well.  If we can't manufacture a dirent, then
unless someone else has some bright ideas about something clever we
can do, then I think this problem blows up in complexity to a level
where I don't think it's worth addressing.

I almost ripped the optimization out altogether (just how much do we
really save by not looking into the leading two directories?), except
that unpack_trees() calls into the same code with a leading path and I
didn't want to mess with that.

Any bright ideas about what to do here?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux