Re: Optimization for "git clean -ffdx"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

Thanks for this nice and very detailed email!

On Sun, Feb 13, 2022 at 4:25 AM Elijah Newren <newren@xxxxxxxxx> wrote:
> There are two steps -- collect the list of things that are removable,
> and then a separate step to remove the things that are removable.  It
> can be the case that we recurse into the same directory twice, once
> for each step.  For the first step, see the fill_directory() call.  If
> you left off "-x" or only had one "-f", or your directories had some
> tracked files, then fill_directory()'s job is finding out which things
> are removable and it'd likely only be a subset of those directories.
> For the second step, some of those removable things may be entire
> directories.  So when it hits one of those and later calls
> remove_dirs() to remove it, it has to recurse again.

I am not sure I understand perfectly why there are 2 steps but this is detail.

The observation I had in the second step for removal, it is this type
of patterns using strace:
getcwd("/tmp/repo", 129) = 46
newfstatat(AT_FDCWD, "/tmp/repo/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a/a/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
rmdir("a/a/a/a")                        = 0
newfstatat(AT_FDCWD, "a/a/a/b", {st_mode=S_IFDIR|0700, st_size=4096,
...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "a/a/a/b", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
newfstatat(6, "", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_EMPTY_PATH) = 0
getdents64(6, 0x55b718d8d650 /* 2 entries */, 32768) = 48
getdents64(6, 0x55b718d8d650 /* 0 entries */, 32768) = 0
close(6)                                = 0
getcwd("/tmp/repo", 129) = 46
newfstatat(AT_FDCWD, "/tmp/repo/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a/a", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/tmp/repo/a/a/a/b", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
rmdir("a/a/a/b")                        = 0

Here I have a repository with quite some nested and untracked directories.
We can see that many "newfstatat" are repeated for the same path. Also
"getcwd" is repeated many times.
I guess here "strbuf_realpath" would benefit from some caching.


> > Using "-ff" should not check for nested .git and no need to recurse if
> > the directory is already untracked.
>
> Not quite; the -x and lack of -e options is critical here, otherwise
> we cannot just nuke the whole directory.  (Also, -d is important, but
> that just kind of goes without saying.)

Indeed, you are right.


> Further, setting this flag is only solving part of the performance
> problem.  We'll still recurse into the directories to look for ignored
> files, which should be avoided since we don't need to differentiate.
> That basically means that we need to avoid all the code in the current
>
>    if (remove_directories && !ignored_only)
>
> block, whenever (remove_directories && ignored && !exclude_list.nr &&
> force > 1).

Good point!


> > Another thfing to note is that it shows "Removing XXX" but it shows it
> > when the directory is already gone. So we could change to "Removed
> > XXX" or display the "Removing XXX" before starting to remove the
> > directory.
>
> I commented on Bagas' patch, but I think this is minutiae that won't
> be relevant to the end user, and would rather either ignore it or just
> move the print statement earlier rather than increasing work for
> translators.  That is, unless we tend to use past tense elsewhere in
> the UI and we want to make a concerted effort to convert to using past
> tense.

If we want to be correct, then we should then move the "Removing"
printing before but then it makes the "gone" flag a bit useless.
I will defer to you if this is something we want to fix.


> Anyway, I'll be happy to review your patch(es) and
> will take whichever parts you don't want to tackle.

If you don't mind, I will let you adjust the documentation.
Following the patch series for the 2 other changes.

Thanks a lot Elijah. Very appreciated!
--
Patrick Marlier



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux