Re: [RFH] eol=lf on existing mixed line-ending files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 09, 2011 at 10:58:59PM +0400, Dmitry Potapov wrote:

> > Now we come to the first confusing behavior. Generally one would expect
> > the working directory to be clean after a "git reset --hard". But not
> > here:
> >
> > Âgit reset --hard &&
> > Âgit status
> >
> > will still show "mixed" as modified.
> 
> It is because you discard all changes except to .gitattributes.  If
> .gitattributes were tracked, "reset" would discard them too, and you
> would get clean original state.

Yeah, in this case. But gitattributes could easily be in the repository
already, and reset still wouldn't change it (as it is in the jquery
example).

> > So that kind of makes sense. But it isn't all that helpful, if I just
> > want to reset my working tree to something sane without making a new
> > commit (more on this later).
> 
> If we do not discard changes to .gitattributes then the question is
> what a sane state is? It is really difficult to define what is sane
> when conversion to the work tree and back gives a different result.

Agreed. The problem is the disconnect between what is in the repository,
and what _would_ be in the repository if we committed the file. So
obviously what the user is giving to git in this case is slightly
insane.

I just wonder if git can do better. But the only options I could think
of are:

  1. Set the working tree file to have just LF's. But that doesn't help,
     since it is the conversion _to_ linefeeds that make it look like
     the file is changed. So we'd still see unstaged changes.

  2. Set the index file to have just LF's. That would make the working
     tree look clean, but it would look like changes are staged, which
     is even worse.

> > But here's an extra helping of confusion on top. Every once in a while,
> > doing the reset _won't_ keep "mixed" as modified. I can trigger it
> > reliably by inserting an extra sleep into git:
> 
> you can have the same effect by doing:
> 
> git reset --hard HEAD && sleep 1 && git touch .git/index

Yeah, that has the same effect. I wanted to show the sleep inside git to
demonstrate that it really is an inside-git race condition.

> Ironically, that the race that you observed is result of fixing another
> race in git when files are changed too fast, so they may have the same
> timestamp. To prevent this race, git checks timestamp of .git/index
> and a trcking file. If .git/index timestamp is older or same as that file,
> this file is considered dirty. So, it is re-read from the disk to check
> if there are any changes. This works well but only if conversion to the
> work tree and back produces the same result.

Yeah, that's my analysis, too.

> > So we get two different outcomes, depending on the index raciness. Which
> > one is right, or is it right for it to be non-deterministic?
> 
> I like everything being deterministic, but in this case I do not see
> how it is possible without making the normal case much slower.

I think if you took my (1) suggestion above, it would be deterministic.
I don't know how much that would help. It would at least force people to
always see the change and hopefully spur them to commit the fixed
line-endings.

> > And one final question. Let's say I don't immediately convert this mixed
> > file to the correct line-endings.
> 
> IMHO, adding .gitattributes that specifies line endings while not
> fixing actual line endings of existing files is really a bad idea.

I absolutely agree, and my first advice upon seeing this jquery repo was
to fix those line endings. But they went for over a year with the broken
setup, so clearly it wasn't bothering them. I wonder what git could do
better to provoke them to fix it sooner.

> As with any other filter, the rule is that conversion from git to
> the working tree and back should give the same result for any file
> in the repository, otherwise you will have a lot of troubles later.

I think that's a good rule in general, but doesn't crlf=input (and now
eol=lf, and by extension, the text attribute) encourage exactly that if
you have mixed line-ending files?

I think the moral of the story may simply be that mixed line-ending text
files are an abomination which should be rooted out and destroyed.

> > Âgit clone git://github.com/jquery/jquery.git &&
> > Âcd jquery &&
> > Âgit checkout 1.4.2 &&
> > Âgit checkout master
> >
> > which will fail (but may succeed racily on a slow enough machine).
> > Obviously they need to fix the mixed line-ending files in their repo.
> > But that fix would be on HEAD, and "git checkout 1.4.2" will be forever
> > broken. Is there a way to fix that?
> 
> You cannot change the past history. Well, you can overwrite that
> setting using .git/info/attributes. It does not make sense to do
> that in general, but it may be useful if you do git bisect.

The problem with that is that for recent commits you want one set of
attributes (where the files have been fixed), and for going back to
older commits, you want a different set of attributes (where you say
"don't care about line endings in these files").

One solution would be to have a git-notes ref with per-commit
attributes, so you could selectively override attributes as you explore
history.

> BTW, nowadays, we have much better alternative than using
> 
> * crlf=input
> 
> Instead of it, you probably want to use:
> 
> * text=auto

Agreed, and I already recommended that to jquery people (actually, one
of the problem files you will see in the example above is a binary file,
though later on they ended up fixing its attributes by specifically
marking its extension as binary).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]