Lars Schneider <larsxschneider@xxxxxxxxx> writes: >>> We probably should be using O_NOATIME for all O_RDONLY cases >>> to get the last bit of performance out (especially since >>> non-modern-Linux systems probably still lack relatime). >> >> No, please do not go there. >> >> The user can read from a file in a working tree using "less", >> "grep", etc., and they all update the atime, so should "git grep". >> We do not use atime ourselves on these files but we should let >> outside tools rely on the validity of atime (e.g. "what are the >> files that were looked at yesterday?"). >> >> If you grep for noatime in our current codebase, you'd notice that >> we use it only for files in objects/ hierarchy, and that makes very >> good sense. These files are what we create for our _sole_ use and >> no other tools can peek at them and expect to get any useful >> information out of them (we hear from time to time that virus >> scanners leaving open file descriptors on them causing trouble, but >> that is an example of a useless access), and that makes a file in >> objects/ hierarchy a fair game for noatime optimization. > > How do we deal with read-cache:ce_compare_data, though? We are using open() in our current code in that codepath, without NOATIME. We shouldn't start using git_open_noatime() merely for convenience. We would probably want to do a preliminary refactoring so that we can say "we want only CLOEXEC and not NOATIME". Perhaps we would want to make the existing one in sha1_file.c to something like: int git_open_noatime(const char *name) { return git_open(name, GIT_OPEN_NOATIME); } and then add #define GIT_OPEN_NOATIME 01 to cache.h and add int git_open(const char *name, unsigned int flag) { static int open_noatime = O_NOATIME; for (;;) { int fd, open_flag; open_flag = 0; if (flag & GIT_OPEN_NOATIME) open_flag |= open_noatime; errno = 0; fd = open(name, O_RDONLY | open_flag); if (fd >= 0) return fd; if (errno == ENOENT || !open_flag) return -1; /* The failure may be due to additional flags */ if ((flag & GIT_OPEN_NOATIME) && (open_flag & O_NOATIME)) { flag &= ~GIT_OPEN_NOATIME; open_noatime = 0; } } } to wrapper.c in the first step, which is a "no-op refactoring" step. Then add #define GIT_OPEN_CLOEXEC 02 and update git_open(), perhaps like so: int git_open(const char *name, unsigned int flag) { static int open_noatime = O_NOATIME; static int open_cloexec = O_CLOEXEC; for (;;) { int fd, open_flag; open_flag = 0; if (flag & GIT_OPEN_NOATIME) open_flag |= open_noatime; if (flag & GIT_OPEN_CLOEXEC) open_flag |= open_cloexec; errno = 0; fd = open(name, O_RDONLY | open_flag); if (fd >= 0) return fd; if (errno == ENOENT || !open_flag) return -1; /* The failure may be due to additional flags */ if ((flag & GIT_OPEN_NOATIME) && (open_flag & O_NOATIME)) { flag &= ~GIT_OPEN_NOATIME; open_noatime = 0; } if ((flag & GIT_OPEN_CLOEXEC) && (open_flag & O_CLOEXEC)) { flag &= ~GIT_OPEN_CLOEXEC; open_cloexec = 0; } } } The retry logic is "if we were asked to do this flag, and if we did pass that flag, then we know open() with that flag fails here, so we won't waste time trying with it again", which came from the NOATIME codepath we already have, but it may not match what we use CLOEXEC for and may need to be adjusted. I didn't think that part of the code through. Then the ce codepath that reads from the working tree would use git_open(ce->name, GIT_OPEN_CLOEXEC); to obtain the file descriptor for reading, perhaps?