Re: [PATCH] travis-ci: run previously failed tests first, then slowest to fastest

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 27 Jan 2016 11:05:07 -0800

Clemens Buchacher <drizzd@xxxxxx> writes:

> Coming back to "[PATCH] optionally disable gitattributes": The topics
> are related, because they both deal with the situation where the work
> tree has files which are not normalized according to gitattributes. But
> my patch is more about saying: ok, I know I may have files which need to
> be normalized, but I want to ignore this issue for now. Please disable
> gitattributes for now, because I want to work with the files as they are
> committed. Conversely, the discussion here is about how to reliably
> detect and fix files which are not normalized.

I primarily wanted to make sure that you understood the underlying
issue, so that I do not have to go back to the basics in the other
thread.  And it is clear that you obviously do, which is good.

Here, you seem to think that what t0025 wants to see happen is
sensible, judging by the fact that you call "rm .git/index && git
reset" a "fix".

My take on this is quite different.  After a "reset --hard HEAD", we
should be able to trust the cached stat information and have "diff
HEAD" say "no changes".  That is what you essentially want in the
other thread, if I understand you correctly, and in an ideal world
where the filesystem timestamp has infinite precision, that is what
would happen in t0025, always "breaking" its expectation.  The real
world has much coarser timestamp granularity than ideal, and that is
why the test appear to be "flaky", failing to give "correct" outcome
some of the time--but I'd say that it is expecting a wrong thing.

An index entry that has data that does not round-trip when it goes
through convert_to_working_tree() and then convert_to_git() "breaks"
this arrangement, and I'd view it as the user having an inconsistent
data.  It is like you are in a repository that still has an unmerged
paths--you cannot proceed before you resolve them.

Anyway.

As to your patch in the other thread, here is what I think:

 (1) When you know (or perhaps your CI knows) that the working tree
     has never been modified since you did "reset --hard HEAD" (or
     its equivalent, like "git checkout $branch" from a clean
     state), these paths with inconsistent data would break the
     usual check to ask "is the working tree clean?"  That is a
     problem and we need a way to ensure that the working tree is
     always judged to be clean immediately after "reset --hard
     HEAD".  IOW, I agree with you that the issue you are trying to
     solve is worth solving.

 (2) Regardless of the "inconsistent data breaking the cleanliness
     check" issue, it may be handy to have a way to temporarily
     disable the attributes, i.e. allow us to ask "what happens if
     there is no attributes defined?"  IOW, I am saying that the
     change in the patch is not without merit.

In addition to (1), I further think that this sequence should not
report that the path F is modified:

     # Write F from HEAD to the working tree, after passing it
     # through convert_to_working_tree()
     $ git reset --hard HEAD

     # Force the re-reading, without changing the contents at all
     $ cp F F.new
     $ mv F.new F

     $ git diff HEAD

which is broken by paths with inconsistent data.  Your CI would want
a way to make that happen.

However, I do not think disabling attributes (i.e. (2)) is a
solution to the issue (i.e. (1)), which we just agreed to be an
issue that is worth solving, for at least two reasons.

 * Even without any attributes, core.autocrlf setting can get the
   data in your index (whose lines can be terminated with CRLF) into
   the same "inconsistent data" situation.  Disabling attribute
   handling would not have any effect on that codepath, I think.

 * The indexed data and the contents in the working tree file may
   match only because the clean/smudge transformation is done.  If
   you disable attributes, re-checking by passing the working tree
   contents through convert_to_git() and comparing the result with
   what is in the index would tell you that they are different, even
   if the clean/smudge filter pair implements round-trip operations
   correctly.

One way to solve (1) I can think of is to change the definition of
ce_compare_data(), which is called by the code that does not trust
the cached stat data (including but not limited to the Racy Git
codepath).  The current semantics of that function asks this
question:

    We do not know if the working tree file and the indexed data
    match.  Let's see if "git add" of that path would record the
    data that is identical to what is in the index.

This definition was cast in stone by 29e4d363 (Racy GIT, 2005-12-20)
and has been with us since Git v1.0.0.  But that does not have to be
the only sensible definition of this check.  I wonder what would
break if we ask this question instead:

    We do not know if the working tree file and the indexed data
    match.  Let's see if "git checkout" of that path would leave the
    same data as what currently is in the working tree file.

If we did this, "reset --hard HEAD" followed by "diff HEAD" will by
definition always report "is clean" as long as nobody changes files
in the working tree, even with the inconsistent data in the index.

This still requires that convert_to_working_tree(), i.e. your smudge
filter, is deterministic, though, but I think that is a sensible
assumption for sane people, even for those with inconsistent data in
the index.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html