On Mon, Dec 16, 2019 at 10:13 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Elijah Newren <newren@xxxxxxxxx> writes: > > >> > > + memset(&cdir, 0, sizeof(cdir)); > >> > > + memset(&de, 0, sizeof(de)); > >> > > + cdir.de = &de; > >> > > + de.d_type = DT_DIR; > >> > > >> > So here, `de` is zeroed out, and therefore `de.d_name` is `NULL`. > >> > >> Um, yeah...didn't I have an allocation of de.d_name here? It will > >> always have a subset of path copied into it, so an allocation of len+1 > >> is plenty long enough. > > > > Actually, it looks like I looked up the definition of dirent > > previously and forgot by the time you emailed. On linux, from > > /usr/include/bits/dirent.h: > > > > struct dirent > > { > > .... > > unsigned char d_type; > > char d_name[256]; /* We must not include limits.h! */ > > }; > > > > ... > > Uh, oh. The size of "struct dirent" is unspecified and it is asking > for trouble to allocate one yourself (iow, treat it pretty much as > something you can only get a pointer to an instance from readdir()). > For example, a dirent that comes back readdir() may have a lot > longer name than the sizeof(.d_name[]) above may imply. > > Do you really need to manufacture a dirent yourself, or can you use > a more concrete type you invent yourself? I need to manufacture a dirent myself; short of that, the most likely alternative is to drop patches 2 & 5-8 of this series and throw my hands in the air and give up. That probably deserves an explanation... Years ago someone noticed that if a user ran "git ls-files -o foo/bar/one foo/bar/two", that we could try to optimize by noticing that we won't be interested in anything until we get to foo/bar/. So, they tried to short-circuit the read_directory_recursive() and readdir() calls, but couldn't reuse the same treat_path() logic to check that we should even go into foo/bar/ at all. So there was some copy & paste from treat_path() into a new treat_leading_path()...and that both missed some important parts and the logic further diverged over time. This patch was about categorizing the suite of bugs that arose from not using treat_path() for checks from both codepaths, and tried to correct those problems. treat_path() takes a dirent, and several of the functions it calls all take a dirent. It'd be an awful lot of work to rip it out. So I manufactured a dirent myself so that we could use the same codepaths and not only fix all these bugs but prevent future ones as well. If we can't manufacture a dirent, then unless someone else has some bright ideas about something clever we can do, then I think this problem blows up in complexity to a level where I don't think it's worth addressing. I almost ripped the optimization out altogether (just how much do we really save by not looking into the leading two directories?), except that unpack_trees() calls into the same code with a leading path and I didn't want to mess with that. Any bright ideas about what to do here?