Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Karsten,

On 2015-09-28 12:39, Karsten Blees wrote:
> Different git variants record file times in the index with different
> precisions, according to their capabilities. E.g. git compiled with NO_NSEC
> records seconds only, JGit records the mtime in milliseconds, but leaves
> ctime blank (because ctime is unavailable in Java).
> 
> This causes performance issues in git compiled with USE_NSEC, because index
> entries with such 'incomplete' timestamps are considered dirty, triggering
> unnecessary content checks.
> 
> Add a file time comparison function that auto-detects the precision based
> on the number of trailing 0 digits, and compares with the lower precision
> of both values. This initial version supports the known precisions seconds
> (git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
> can be easily extended to e.g. microseconds.
> 
> Use the new comparison function in both dirty and racy checks. As a side
> effect, this fixes racy detection in USE_NSEC-enabled git with
> core.checkStat=minimal, as the coreStat setting now affects racy checks as
> well.
> 
> Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Great analysis, and nice patch. I would like to offer one suggestion in addition:

> diff --git a/read-cache.c b/read-cache.c
> index 87204a5..3a4e6cd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
>  	sd->sd_size = st->st_size;
>  }
>  
> +/*
> + * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
> + * Auto-detects precision based on trailing 0 digits. Compares seconds only if
> + * core.checkStat=minimal.
> + */
> +static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
> +			       uint32_t t2_sec, uint32_t t2_nsec) {
> +#ifdef USE_NSEC
> +	/*
> +	 * Compare seconds and return result if different, or checkStat=mimimal,
> +	 * or one of the time stamps has second precision only (nsec == 0).
> +	 */
> +	int diff = t1_sec - t2_sec;
> +	if (diff || !check_stat || !t1_nsec || !t2_nsec)
> +		return diff;
> +
> +	/*
> +	 * Check if one of the time stamps has millisecond precision only (i.e.
> +	 * the trailing 6 digits are 0). First check the trailing 6 bits so that
> +	 * we only do (slower) modulo division if necessary.
> +	 */
> +	if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
> +	    (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
> +		/* Compare milliseconds. */
> +		return (t1_nsec - t2_nsec) / 1000000;
> +
> +	/* Compare nanoseconds */
> +	return t1_nsec - t2_nsec;
> +#else
> +	return t1_sec - t2_sec;
> +#endif
> +}

As this affects only setups where the same repository is accessed via clients with different precision, would it make sense to hide this behind a config option? I.e. something like

static int cmp_filetime_precise(uint32_t t1_sec, uint32_t t1_nsec,
			        uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	return t1_sec != t2_sec ? t1_sec - t2_sec : t1_nsec - t2_nsec;
#else
	return t1_sec - t2_sec;
#endif
}

static int cmp_filetime_mixed(uint32_t t1_sec, uint32_t t1_nsec,
			      uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	... detect lower precision and compare with lower precision only...
#else
	return t1_sec - t2_sec;
#endif
}

static (int *)cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
			   uint32_t t2_sec, uint32_t t2_nsec)
	= cmp_filetime_precise;

... modify cmp_filetime_precise if core.mixedTimeSpec = true...

Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]