Re: [PATCH v3] gc: ignore old gc.log files

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 10 Feb 2017 11:47:19 -0800

David Turner <dturner@xxxxxxxxxxxx> writes:

> It would be good if automatic gc were useful to server operators.
> A server can end up in a state whre there are
> lots of unreferenced loose objects (say, because many users are doing
> a bunch of rebasing and pushing their rebased branches). Before this
> patch, this state would cause a gc.log file to be created, preventing
> future auto gcs.  Then pack files could pile up.  Since many git
> operations are O(n) in the number of pack files, this would lead to
> poor performance.  Now, the pack files will get cleaned up, if
> necessary, at least once per day.  And operators who find a need for
> more-frequent gcs can adjust gc.logExpiry to meet their needs.
>
> Git should never get itself into a state where it refuses to do any
> maintenance, just because at some point some piece of the maintenance
> didn't make progress.
>
> In this commit, git learns to ignore gc.log files which are older than
> (by default) one day old.  It also learns about a config, gc.logExpiry
> to manage this.  There is also some cleanup: a successful manual gc,
> or a warning-free auto gc with an old log file, will remove any old
> gc.log files.
>
> It might still happen that manual intervention is required
> (e.g. because the repo is corrupt), but at the very least it won't be
> because Git is too dumb to try again.

Thanks, nicely explained.

> +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> +		if (errno == ENOENT) {
> +			/*
> +			 * The user has probably intentionally deleted
> +			 * gc.log.lock (perhaps because they're blowing
> +			 * away the whole repo), so thre's no need to
> +			 * report anything here.  But we also won't
> +			 * delete gc.log, because we don't know what
> +			 * the user's intentions are.
> +			 */

OK.

> +		} else {
> +			FILE *fp;
> +			int fd;
> +			int saved_errno = errno;
> +			/*
> +			 * Perhaps there was an i/o error or another
> +			 * unlikely situation.  Try to make a note of
> +			 * this in gc.log.  If this fails again,
> +			 * give up and leave gc.log as it was.
> +			 */
> +			rollback_lock_file(&log_lock);
> +			fd = hold_lock_file_for_update(&log_lock,
> +						       git_path("gc.log"),
> +						       LOCK_DIE_ON_ERROR);
> +
> +			fp = fdopen(fd, "w");
> +			fprintf(fp, _("Failed to fstat %s: %s"),
> +				get_tempfile_path(&log_lock.tempfile),
> +				strerror(errno));

I think you meant to use saved_errno in this message and then

> +			fclose(fp);
> +			commit_lock_file(&log_lock);

possibly assign it back to errno around here?

> +		}
> +
> +	} else if (st.st_size) {
> +		/* There was some error recorded in the lock file */
>  		commit_lock_file(&log_lock);
> -	else
> +	} else {
> +		/* No error, clean up any old gc.log */
> +		unlink(git_path("gc.log"));
>  		rollback_lock_file(&log_lock);
> +	}
>  }

> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 1762dfa6a..84ad07eb2 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -67,5 +67,18 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
>  	test_line_count = 2 new # There is one new pack and its .idx
>  '
>  
> +test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
> +	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&

You could save one process with:

    ls .git/objects/pack/*.pack | sed -e "s/pack$/keep/" -e q

but do you even need $keep?  I do not see it used below.

> +	test_commit foo &&
> +	test_commit bar &&
> +	git repack &&
> +	test_config gc.autopacklimit 1 &&
> +	test_config gc.autodetach true &&
> +	echo fleem >.git/gc.log &&
> +	test_must_fail git gc --auto 2>err &&
> +	test_i18ngrep "^error:" err &&
> +	test-chmtime =-86401 .git/gc.log &&
> +	git gc --auto
> +'
>  
>  test_done

Other than that I didn't spot anything suspicious.  I'll wait to see
what others may want to add.

Thanks.