Re: broken racy detection and performance issues with nanosecond file times

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 28 Sep 2015 11:17:19 -0700

Karsten Blees <karsten.blees@xxxxxxxxx> writes:

> Ideas for potential solutions:
> ==============================
>
> Performance issues:
> -------------------
>
> 1. Compare file times in minimum supported precision
>    When comparing file times, use the minimum precision supported by
>    both the writing and reading git implementations.
> 1a. Simplest variant: Don't compare nanoseconds if the field in the
>    cached index entry is 0. JGit already does this [5], but at the
>    same time it is very unfriendly to USE_NSEC-enabled git by storing
>    only milliseconds in the nanosecond field. This "simple" solution
>    implies that git implementations that cannot provide full
>    nanosecond precision must leave the nanosecond field empty.
> 1b. More involved: Store the precision in the index entry.
>    We only need 30 bits to encode nanoseconds, so the high 2 bits of
>    the nanosecond field could be used as follows:
>    00: second precision (i.e. ignore, for backward compatibility)
>    01: millisecond precision
>    10: microsecond precision
>    11: nanosecond precision
>    When reading the index, USE-NSEC-enabled git implementations would
>    do dirty checks with the minimum precision supported by themselves
>    and the creator of the index entry.

Yeah, my gut feeling is that we should make sure that at least 1a is
done by all implementations.

I agree that 1b. is a bit more involved in that all binary that was
built with USE_NSEC that is not aware of these 2-bits need to be
eradicated for a new version to be deployed --- the transition for
users who use multiple implementations will be a pain (those that
use just one implementation of Git can just say "rm -f .git/index &&
git reset --hard" or something after updating to the new version of
Git).

> 2. Don't use ctime in dirty checks if ctime.sec == 0.

OK.  That is slightly less drastic than !trust_ctime, I guess.

> Racy detection:
> ---------------
>
> 3. Minimal racy solution
>    * Do all racy checks with second-precision only.
>    * When committing an index.lock file, reset mtime to the time
>      before git started reading the old index (i.e. time(null) when
>      calling read_cache()).
>
>    I believe this should fix all three racy problems described above,
>    although restraining ourselves to second-precision somewhat
>    thwarts the ability to track nanoseconds in the first place.
>    
>    The problem with this solution is that files changed by git itself
>    will appear racy to the next git process, thus increasing the
>    performance penalty after e.g. a large checkout. Although I think
>    that re-reading the file after the file's mtime is the only way to
>    be really sure it hasn't been changed.

... the last of which is what is done anyway, so I think the above,
especally the second bullet-point, is all sensible.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html