Re: [PATCH v1 2/2] read-cache: make sure file handles are not inherited by child processes

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 08 Sep 2016 10:37:53 -0700

Lars Schneider <larsxschneider@xxxxxxxxx> writes:

>>> We probably should be using O_NOATIME for all O_RDONLY cases
>>> to get the last bit of performance out (especially since
>>> non-modern-Linux systems probably still lack relatime).
>> 
>> No, please do not go there.
>> 
>> The user can read from a file in a working tree using "less",
>> "grep", etc., and they all update the atime, so should "git grep".
>> We do not use atime ourselves on these files but we should let
>> outside tools rely on the validity of atime (e.g. "what are the
>> files that were looked at yesterday?").
>> 
>> If you grep for noatime in our current codebase, you'd notice that
>> we use it only for files in objects/ hierarchy, and that makes very
>> good sense.  These files are what we create for our _sole_ use and
>> no other tools can peek at them and expect to get any useful
>> information out of them (we hear from time to time that virus
>> scanners leaving open file descriptors on them causing trouble, but
>> that is an example of a useless access), and that makes a file in
>> objects/ hierarchy a fair game for noatime optimization.
>
> How do we deal with read-cache:ce_compare_data, though?

We are using open() in our current code in that codepath, without
NOATIME.  We shouldn't start using git_open_noatime() merely for
convenience.

We would probably want to do a preliminary refactoring so that we
can say "we want only CLOEXEC and not NOATIME".  Perhaps we would
want to make the existing one in sha1_file.c to something like:

	int git_open_noatime(const char *name)
        {
		return git_open(name, GIT_OPEN_NOATIME);
	}

and then add

	#define GIT_OPEN_NOATIME 01

to cache.h and add

	int git_open(const char *name, unsigned int flag)
        {
        	static int open_noatime = O_NOATIME;

		for (;;) {
                	int fd, open_flag;

                        open_flag = 0;
                        if (flag & GIT_OPEN_NOATIME)
                                open_flag |= open_noatime;

			errno = 0;
                        fd = open(name, O_RDONLY | open_flag);
                        if (fd >= 0)
                        	return fd;

			if (errno == ENOENT || !open_flag)
				return -1;

			/* The failure may be due to additional	flags */
                       	if ((flag & GIT_OPEN_NOATIME) &&
			    (open_flag & O_NOATIME)) {
				flag &= ~GIT_OPEN_NOATIME;
				open_noatime = 0;
			}
		}
	}

to wrapper.c in the first step, which is a "no-op refactoring" step.

Then add

	#define GIT_OPEN_CLOEXEC 02

and update git_open(), perhaps like so:

	int git_open(const char *name, unsigned int flag)
        {
        	static int open_noatime = O_NOATIME;
        	static int open_cloexec = O_CLOEXEC;

		for (;;) {
                	int fd, open_flag;

                        open_flag = 0;
                        if (flag & GIT_OPEN_NOATIME)
                                open_flag |= open_noatime;
                        if (flag & GIT_OPEN_CLOEXEC)
                                open_flag |= open_cloexec;

			errno = 0;
                        fd = open(name, O_RDONLY | open_flag);
                        if (fd >= 0)
                        	return fd;

			if (errno == ENOENT || !open_flag)
				return -1;

			/* The failure may be due to additional	flags */
                       	if ((flag & GIT_OPEN_NOATIME) &&
			    (open_flag & O_NOATIME)) {
				flag &= ~GIT_OPEN_NOATIME;
                                open_noatime = 0;
			}
                       	if ((flag & GIT_OPEN_CLOEXEC) &&
			    (open_flag & O_CLOEXEC)) {
				flag &= ~GIT_OPEN_CLOEXEC;
                                open_cloexec = 0;
			}
		}
	}

The retry logic is "if we were asked to do this flag, and if we did
pass that flag, then we know open() with that flag fails here, so we
won't waste time trying with it again", which came from the NOATIME
codepath we already have, but it may not match what we use CLOEXEC
for and may need to be adjusted.  I didn't think that part of the
code through.

Then the ce codepath that reads from the working tree would use

	git_open(ce->name, GIT_OPEN_CLOEXEC);

to obtain the file descriptor for reading, perhaps?