[RFC] git gc "--prune=now" semantics considered harmful

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So this is a RFC patch, I'm not sure how much people really care, but I 
find the current behavior of "git gc --prune=now" to be unnecessarily 
dangerous.

There's two issues with it:

 (a) parse_expiry_date() considers "now" to be special, and it actually
     doesn't mean "now" at all, it means "everything".

 (b) the date parsing isn't actually done "now", it's done *after* gc has 
     already run, and we run "git prune --expire". So even if (a) wasn't 
     true, "--prune=now" wouldn't actually mean "now" when the user 
    expects it to happen, but "after doing repacking".

I actually think that the "parse_expiry_date()" behavior makes sense 
within the context of "git prune --expire", so I'm not really complaining 
about (a) per se. I just think that what makes sense within the context of 
"git prune" does *not* necessarily make sense within the context of "git 
gc".

Why do I care? I end up doing lots of random things in my local 
repository, and I also end up wanting to keep my repository fairly clean, 
so I tend to do "git gc --prune=now" to just make sure everything is 
packed and I've gotten rid of all the temporary stuff that so often 
happens when doing lots of git merges (which is what I do). 

You won't see those temporary objects for the usual trivial merges, but 
whenever you have a real recursive merge with automated conflict 
resolution, there will be things like those temporary merge-only objects 
for the 3-way base merge state. 

Soes my use pattern of "git gc --prune=now" make sense? Maybe not. But 
it's what I've gotten used to, and it's at least not entirely insane.

But at least once now, I've done that "git gc" at the end of the day, and 
a new pull request comes in, so I do the "git pull" without even thinking 
about the fact that "git gc" is still running.

And then the "--prune=now" behavior is actually really pretty dangerous. 
Because it will prune *all* unreachable objects, even if they are only 
*currently* unreachable because they are in the process of being unpacked 
by the concurrent "git fetch" (and I didn't check - I might just have been 
unlocky, bit I think "git prune" ignores FETCH_HEAD).

So I actually would much prefer that foir git gc, "--prune=now" means

 (a) "now"

 (b) now at the _start_ of the "git gc" operation, not the time at
     the _end_ of the operation when we've already spent a minute or
     two doing repacking and are now doing the final pruning.

anyway, with that explanation in mind, I'm appending a patch that is 
pretty small and does that. It's a bit hacky, but I think it still makes 
sense.

Comments?

Note that this really isn't likely very noticeable on most projects. When 
I do "git gc" on a fairly well-packed repo of git itself, it takes under 
4s for me. So the window for that whole "do git pull at the same time" is 
simply not much of an issue.

For the kernel, "git gc" takes a minute and a half on the same machine 
(again, this is already a packed repo, it can be worse). So there's a much 
bigger window there to do something stupid,

             Linus

---
 builtin/gc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c4777b244..98368c8b5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -535,8 +535,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (argc > 0)
 		usage_with_options(builtin_gc_usage, builtin_gc_options);
 
-	if (prune_expire && parse_expiry_date(prune_expire, &dummy))
-		die(_("failed to parse prune expiry value %s"), prune_expire);
+	if (prune_expire) {
+		if (!strcmp(prune_expire, "now"))
+			prune_expire = show_date(time(NULL), 0, DATE_MODE(ISO8601));
+		if (parse_expiry_date(prune_expire, &dummy))
+			die(_("failed to parse prune expiry value %s"), prune_expire);
+	}
 
 	if (aggressive) {
 		argv_array_push(&repack, "-f");



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux