Hi Elijah, Thanks for this nice and very detailed email! On Sun, Feb 13, 2022 at 4:25 AM Elijah Newren <newren@xxxxxxxxx> wrote: > There are two steps -- collect the list of things that are removable, > and then a separate step to remove the things that are removable. It > can be the case that we recurse into the same directory twice, once > for each step. For the first step, see the fill_directory() call. If > you left off "-x" or only had one "-f", or your directories had some > tracked files, then fill_directory()'s job is finding out which things > are removable and it'd likely only be a subset of those directories. > For the second step, some of those removable things may be entire > directories. So when it hits one of those and later calls > remove_dirs() to remove it, it has to recurse again. I am not sure I understand perfectly why there are 2 steps but this is detail. The observation I had in the second step for removal, it is this type of patterns using strace: getcwd("/tmp/repo", 129) = 46 newfstatat(AT_FDCWD, "/tmp/repo/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a/a/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 rmdir("a/a/a/a") = 0 newfstatat(AT_FDCWD, "a/a/a/b", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 openat(AT_FDCWD, "a/a/a/b", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6 newfstatat(6, "", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_EMPTY_PATH) = 0 getdents64(6, 0x55b718d8d650 /* 2 entries */, 32768) = 48 getdents64(6, 0x55b718d8d650 /* 0 entries */, 32768) = 0 close(6) = 0 getcwd("/tmp/repo", 129) = 46 newfstatat(AT_FDCWD, "/tmp/repo/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a/a", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 newfstatat(AT_FDCWD, "/tmp/repo/a/a/a/b", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 rmdir("a/a/a/b") = 0 Here I have a repository with quite some nested and untracked directories. We can see that many "newfstatat" are repeated for the same path. Also "getcwd" is repeated many times. I guess here "strbuf_realpath" would benefit from some caching. > > Using "-ff" should not check for nested .git and no need to recurse if > > the directory is already untracked. > > Not quite; the -x and lack of -e options is critical here, otherwise > we cannot just nuke the whole directory. (Also, -d is important, but > that just kind of goes without saying.) Indeed, you are right. > Further, setting this flag is only solving part of the performance > problem. We'll still recurse into the directories to look for ignored > files, which should be avoided since we don't need to differentiate. > That basically means that we need to avoid all the code in the current > > if (remove_directories && !ignored_only) > > block, whenever (remove_directories && ignored && !exclude_list.nr && > force > 1). Good point! > > Another thfing to note is that it shows "Removing XXX" but it shows it > > when the directory is already gone. So we could change to "Removed > > XXX" or display the "Removing XXX" before starting to remove the > > directory. > > I commented on Bagas' patch, but I think this is minutiae that won't > be relevant to the end user, and would rather either ignore it or just > move the print statement earlier rather than increasing work for > translators. That is, unless we tend to use past tense elsewhere in > the UI and we want to make a concerted effort to convert to using past > tense. If we want to be correct, then we should then move the "Removing" printing before but then it makes the "gone" flag a bit useless. I will defer to you if this is something we want to fix. > Anyway, I'll be happy to review your patch(es) and > will take whichever parts you don't want to tackle. If you don't mind, I will let you adjust the documentation. Following the patch series for the 2 other changes. Thanks a lot Elijah. Very appreciated! -- Patrick Marlier