[RFH] eol=lf on existing mixed line-ending files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I investigated some odd git behavior with the EOL gitattributes today,
and I'm curious to hear what others on the list think of what git does.
In particular, index raciness means git produces non-deterministic
results in this case.

The repo in question has a gitattributes file with "* crlf=input" (which
we would spell "eol=lf" these days, but the results are the same), but
still contains some files with mixed line endings. Which you can
reproduce with:

  git init repo &&
  cd repo &&
  {
    printf 'one\n' &&
    printf 'two\r\n'
  } >mixed &&
  git add mixed &&
  git commit -m one &&
  echo '* eol=lf' >.gitattributes

Now if we run "git status" or "git diff", it will let us know that
"mixed" is modified, insofar as adding and committing it would perform
the LF conversion.

Now we come to the first confusing behavior. Generally one would expect
the working directory to be clean after a "git reset --hard". But not
here:

  git reset --hard &&
  git status

will still show "mixed" as modified. Because of course we are checking
out the version from HEAD into the index and working tree, which has the
mixed line endings. So we rewrite the identical file.

So that kind of makes sense. But it isn't all that helpful, if I just
want to reset my working tree to something sane without making a new
commit (more on this later).

But here's an extra helping of confusion on top. Every once in a while,
doing the reset _won't_ keep "mixed" as modified. I can trigger it
reliably by inserting an extra sleep into git:

diff --git a/unpack-trees.c b/unpack-trees.c
index 500ebcf..735b13e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -223,6 +223,7 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 	stop_progress(&progress);
+	sleep(1);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
 	return errs != 0;

That puts a delay between when reset writes the "mixed" file, and when
we write out the refreshed index. So next time we look at the index
(e.g., in "status"), we will see that the "mixed" entry has up-to-date
stat information and not look at its actual contents.

But in the original case (without the sleep), that doesn't happen.
There, we usually end up writing the file and the index in the same
second. So when status looks at the index, the "mixed" entry is racily
clean, and we actually check it again.

So we get two different outcomes, depending on the index raciness. Which
one is right, or is it right for it to be non-deterministic?

And one final question. Let's say I don't immediately convert this mixed
file to the correct line-endings. Instead, it persists over a large
number of commits, some of them even changing the "mixed" file but not
fixing the line endings[1]. We can simulate that with:

  mv .gitattributes tmp
  echo three >>mixed &&
  git commit -a -m three &&
  mv tmp .gitattributes

Now imagine I am somebody who has cloned this repo; the clone will tend
to end the race condition in the "clean" state, since it will often take
more than 1 second to write out all of the files (at least for a
normal-sized project). We can simulate using our sleep-patched reset:

  git reset --hard

to get a "clean" repo. Now let's say I want to explore old history, so I
go to a detached HEAD, but using normal git, not the sleep-patched one:

  git checkout HEAD^

And, of course, now we think "mixed" is modified. After I'm done
exploring, I want to go back to "master", but I can't:

  $ git checkout master
  error: Your local changes to the following files would be overwritten by checkout:
          mixed

What is the best way out of this situation? You can't use "reset --hard"
to fix the working tree. I guess "git checkout -f" is the best option.

Hopefully my example made sense and was reproducible. The real repo
which triggered this puzzle was jquery. You can try:

  git clone git://github.com/jquery/jquery.git &&
  cd jquery &&
  git checkout 1.4.2 &&
  git checkout master

which will fail (but may succeed racily on a slow enough machine).
Obviously they need to fix the mixed line-ending files in their repo.
But that fix would be on HEAD, and "git checkout 1.4.2" will be forever
broken. Is there a way to fix that?

-Peff

[1] The one thing still puzzling me about the jquery repo is how they
managed to make so many commits (including ones to mixed line ending
files) without seeing the dirty working tree state and committing it. Is
there some combination of config that makes this not happen?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]