On Thu, Apr 05, 2018 at 12:15:59PM -0700, Elijah Newren wrote: > On Thu, Apr 5, 2018 at 11:58 AM, Jeff King <peff@xxxxxxxx> wrote: > > > It sounds like correct_untracked_entries() is doing the wrong thing, and > > it should be aware of the pathspec-matching when culling entries. In > > other words, my understanding was that read_directory() does not > > necessarily promise to cull fully (which is what led to cf424f5fd in the > > first place), and callers are forced to apply their own pathspecs. > > > > The distinction is academic for this particular bug, but it makes me > > wonder if there are other cases where "clean" needs to be more careful > > with what comes out of dir.c. > > Interesting, I read things very differently. Looking back at commit > 6b1db43109ab ("clean: teach clean -d to preserve ignored paths", > 2017-05-23), which introduced correct_untracked_entries(), I thought > that correct_untracked_entries() wasn't there to correct pathspec > issues with fill_directory(), but instead to special case the handling > of files which are both untracked and ignored. Did I mis-read or were > there other commits that changed the semantics? > > Also, it would just seem odd to me that fill_directory() requires > pathspecs, and it uses those pathspecs, but it doesn't guarantee that > the files it returns matches them. That seems like an API ripe for > mis-use, especially since I don't see any comment in the code about > such an assumption. Is that really the expectation? To be honest, I don't know. Most of dir.c predates me, and I've tried to avoid looking at it too hard. But I had a vague recollection of it being "best effort", and this bit from cf424f5fd89b reinforces that: However, read_directory does not actually check against our pathspec. It uses a simplified version that may turn up false positives. As a result, we need to check that any hits match our pathspec. So I don't know that correct_untracked_entries() is there to fix the pathspec handling. But I think that anybody who looks at the output of fill_directory() does need to be aware that they may get more entries than they expected, and has to apply the pathspecs themselves. And that's what that extra dir_path_match() call in cmd_clean() is there for (it used to be match_pathspec before some renaming). I agree it's an error-prone interface. I don't know all the conditions under which dir.c might return extra entries, but it seems like it might be sane for it to do a final pathspec-matching pass so that callers don't have to. That would mean that correct_untracked_entries() sees the correctly culled list, and the extra check in cmd_clean() could be dropped. Ideally, of course, we'd fix those individual cases, since that would be more efficient. And your patch may be the right first step in that direction. But since we don't know what all of them are, it seems ripe for regressions. -Peff