RE: [PATCH v3] gc: ignore old gc.log files

David Turner <David.Turner@xxxxxxxxxxxx> · Fri, 10 Feb 2017 20:44:27 +0000

> -----Original Message-----
> From: Jeff King [mailto:peff@xxxxxxxx]
> Sent: Friday, February 10, 2017 3:09 PM
> To: David Turner <David.Turner@xxxxxxxxxxxx>
> Cc: git@xxxxxxxxxxxxxxx; pclouds@xxxxxxxxx
> Subject: Re: [PATCH v3] gc: ignore old gc.log files
> 
> On Fri, Feb 10, 2017 at 02:20:19PM -0500, David Turner wrote:
> 
> > @@ -76,10 +77,47 @@ static void git_config_date_string(const char
> > *key, const char **output)  static void process_log_file(void)  {
> >  	struct stat st;
> > -	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
> > +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> > +		if (errno == ENOENT) {
> > +			/*
> > +			 * The user has probably intentionally deleted
> > +			 * gc.log.lock (perhaps because they're blowing
> > +			 * away the whole repo), so thre's no need to
> > +			 * report anything here.  But we also won't
> > +			 * delete gc.log, because we don't know what
> > +			 * the user's intentions are.
> > +			 */
> 
> Hrm. Does fstat actually trigger ENOENT in that case? There's no pathname
> lookup happening at all. A simple test on Linux seems to show that it does not.
> Build:
> 
> 	#include <unistd.h>
> 	#include <fcntl.h>
> 	#include <sys/stat.h>
> 
> 	int main(int argc, char **argv)
> 	{
> 		struct stat st;
> 		int fd = open(argv[1], O_WRONLY|O_CREAT|O_EXCL, 0600);
> 		unlink(argv[1]);
> 		fstat(fd, &st);
> 		return 0;
> 	}
> 
> and run:
> 
>   strace ./a.out tmp
> 
> which shows:
> 
>   open("tmp", O_WRONLY|O_CREAT|O_EXCL, 056660) = 3
>   unlink("tmp")                           = 0
>   fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISGID|0640, st_size=0, ...}) = 0
> 
> But maybe there are other systems emulating fstat() would trigger here.
> I dunno.

Yeah, I'm also not sure.  On the other hand, if we're going to catch fstat 
errors anyway, we might as well do something sensible with this one.

> > +		} else {
> > +			FILE *fp;
> > +			int fd;
> > +			int saved_errno = errno;
> > +			/*
> > +			 * Perhaps there was an i/o error or another
> > +			 * unlikely situation.  Try to make a note of
> > +			 * this in gc.log.  If this fails again,
> > +			 * give up and leave gc.log as it was.
> > +			 */
> > +			rollback_lock_file(&log_lock);
> > +			fd = hold_lock_file_for_update(&log_lock,
> > +						       git_path("gc.log"),
> > +						       LOCK_DIE_ON_ERROR);
> 
> If there was an i/o error, then gc.log.lock still exists. And this attempt to lock will
> therefore fail, calling die(). Which would trigger our atexit() to call process_log(),
> which would hit this code again, and so forth. I'm not sure if we'd actually
> recurse when an atexit handler calls exit(). But it seems questionable.

No, because  we call rollback_log_file first.

> I'm also not sure why we need to re-open the file in the first place. We have an
> open descriptor (and we even redirected stderr to it already).
> Why don't we just write to it?

If fstat failed, that probably indicates something bad about the old fd.  I'm not 
actually sure why fstat would ever fail, since in all likelihood, the kernel keeps 
information about inodes corresponding to open fds in-memory.  Maybe someone
forcibly unmounted the drive?  

> > @@ -113,6 +151,9 @@ static void gc_config(void)
> >  	git_config_get_bool("gc.autodetach", &detach_auto);
> >  	git_config_date_string("gc.pruneexpire", &prune_expire);
> >  	git_config_date_string("gc.worktreepruneexpire",
> > &prune_worktrees_expire);
> > +	if (!git_config_get_value("gc.logexpiry", &value))
> > +		parse_expiry_date(value, &gc_log_expire_time);
> > +
> 
> Should you be using git_config_date_string() here? It looks like it does some
> extra sanity-checking. It annoyingly just gets the string, and doesn't parse it.
> Perhaps it would be worth adding a
> git_config_date_value() helper.

That seems like a good idea, but I'm going to skip it for now and promise to
do it next time I need a date config.

> Or alternatively, save the date string here, and then parse once later on, after
> having resolved all config (and overwritten the default value).

Sure.

> > @@ -448,5 +506,8 @@ int cmd_gc(int argc, const char **argv, const char
> *prefix)
> >  		warning(_("There are too many unreachable loose objects; "
> >  			"run 'git prune' to remove them."));
> >
> > +	if (!daemonized)
> > +		unlink(git_path("gc.log"));
> > +
> 
> I think this is a good thing to do, though I'd have probably put it in a separate
> patch. I guess that's a matter of taste.

I could go either way, but since I've already gone this way, I'll stick with it.

> > +test_expect_success 'background auto gc does not run if gc.log is present
> and recent but does if it is old' '
> > +	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
> > +	test_commit foo &&
> > +	test_commit bar &&
> > +	git repack &&
> > +	test_config gc.autopacklimit 1 &&
> > +	test_config gc.autodetach true &&
> > +	echo fleem >.git/gc.log &&
> > +	test_must_fail git gc --auto 2>err &&
> > +	test_i18ngrep "^error:" err &&
> > +	test-chmtime =-86401 .git/gc.log &&
> > +	git gc --auto
> > +'
> 
> This gives only 1 second of leeway. I wonder if we could end up getting bogus
> failures due to system clock adjustments, or even skew between the filesystem
> and OS clocks. Perhaps we should set it farther back, like a few days.
> 
> (It also relies on the 1-day default. That's probably OK, though we could also set
> an explicit default for the test, which would exercise the config code path, too).

Sure, I'll re-roll with that change.