Re: eol round trip Was: [PATCH] travis-ci: run previously failed ....

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/27/2016 08:05 PM, Junio C Hamano wrote:
(Changed the topic, 2 notes inside)
Clemens Buchacher <drizzd@xxxxxx> writes:

Coming back to "[PATCH] optionally disable gitattributes": The topics
are related, because they both deal with the situation where the work
tree has files which are not normalized according to gitattributes. But
my patch is more about saying: ok, I know I may have files which need to
be normalized, but I want to ignore this issue for now. Please disable
gitattributes for now, because I want to work with the files as they are
committed. Conversely, the discussion here is about how to reliably
detect and fix files which are not normalized.
git ls-files --eol can detect that (as Junio pointed out)

I primarily wanted to make sure that you understood the underlying
issue, so that I do not have to go back to the basics in the other
thread.  And it is clear that you obviously do, which is good.

Here, you seem to think that what t0025 wants to see happen is
sensible, judging by the fact that you call "rm .git/index && git
reset" a "fix".

My take on this is quite different.  After a "reset --hard HEAD", we
should be able to trust the cached stat information and have "diff
HEAD" say "no changes".  That is what you essentially want in the
other thread, if I understand you correctly, and in an ideal world
where the filesystem timestamp has infinite precision, that is what
would happen in t0025, always "breaking" its expectation.  The real
world has much coarser timestamp granularity than ideal, and that is
why the test appear to be "flaky", failing to give "correct" outcome
some of the time--but I'd say that it is expecting a wrong thing.

An index entry that has data that does not round-trip when it goes
through convert_to_working_tree() and then convert_to_git() "breaks"
this arrangement, and I'd view it as the user having an inconsistent
data.  It is like you are in a repository that still has an unmerged
paths--you cannot proceed before you resolve them.
This is actually bringing some light to me: the round-trip test.
There are this "well known but less well document" situations where we break that rule:
- files are checked in with CRLF into the repo.
- .gittatributes is set to "text" later.
2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with mixed line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with CRCRLF line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

My feeling is that we should simply say:
You user set attribute to "text" and by doing that, you promised to have files
with LF only in the index.
If you break that promise, Git does  not know, what you really want.
- It may be a situation where you write a shell script which for some reasons needs a '\015' at the end of a line, and Git may treat it wrong by assuming
  that this is a CRLF line ending (end converts it into LF)
- It may be that you want CRLF because you added a Windows .BAT file.
It may be that you use git.git and another implementation of Git, which doesn't support attributes at all, so that a save way to do this is to just commit CRLF.
- It may be that this is a historical issue.
  Everybody using the project uses git that understands .gitattributes,
  so someone may fix it some day.

Can Git make this decision ?

When core.autocrlf is true (and no attributes are set), then the conversion of line ending is disabled.
On 01/27/2016 08:05 PM, Junio C Hamano wrote:
(Changed the topic, 2 notes inside)
Clemens Buchacher <drizzd@xxxxxx> writes:

Coming back to "[PATCH] optionally disable gitattributes": The topics
are related, because they both deal with the situation where the work
tree has files which are not normalized according to gitattributes. But
my patch is more about saying: ok, I know I may have files which need to
be normalized, but I want to ignore this issue for now. Please disable
gitattributes for now, because I want to work with the files as they are
committed. Conversely, the discussion here is about how to reliably
detect and fix files which are not normalized.
git ls-files --eol can detect that (as Junio pointed out)

I primarily wanted to make sure that you understood the underlying
issue, so that I do not have to go back to the basics in the other
thread.  And it is clear that you obviously do, which is good.

Here, you seem to think that what t0025 wants to see happen is
sensible, judging by the fact that you call "rm .git/index && git
reset" a "fix".

My take on this is quite different.  After a "reset --hard HEAD", we
should be able to trust the cached stat information and have "diff
HEAD" say "no changes".  That is what you essentially want in the
other thread, if I understand you correctly, and in an ideal world
where the filesystem timestamp has infinite precision, that is what
would happen in t0025, always "breaking" its expectation.  The real
world has much coarser timestamp granularity than ideal, and that is
why the test appear to be "flaky", failing to give "correct" outcome
some of the time--but I'd say that it is expecting a wrong thing.

An index entry that has data that does not round-trip when it goes
through convert_to_working_tree() and then convert_to_git() "breaks"
this arrangement, and I'd view it as the user having an inconsistent
data.  It is like you are in a repository that still has an unmerged
paths--you cannot proceed before you resolve them.
This is actually bringing some light to me: the round-trip test.
There are this "well known but less well document" situations where we break that rule:
- files are checked in with CRLF into the repo.
- .gittatributes is set to "text" later.
2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with mixed line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with CRCRLF line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

My feeling is that we should simply say:
You user set attribute to "text" and by doing that, you promised to have files
with LF only in the index.
If you break that promise, Git does  not know, what you really want.
- It may be a situation where you write a shell script which for some reasons needs a '\015' at the end of a line, and Git may treat it wrong by assuming
  that this is a CRLF line ending (end converts it into LF)
- It may be that you want CRLF because you added a Windows .BAT file.
It may be that you use git.git and another implementation of Git, which doesn't support attributes at all, so that a save way to do this is to just commit CRLF.
- It may be that this is a historical issue.
  Everybody using the project uses git that understands .gitattributes,
  so someone may fix it some day.

Can Git make this decision ?

When core.autocrlf is true (and no attributes are set), then the conversion of line ending is disabled.
On 01/27/2016 08:05 PM, Junio C Hamano wrote:
(Changed the topic, 2 notes inside)
Clemens Buchacher <drizzd@xxxxxx> writes:

Coming back to "[PATCH] optionally disable gitattributes": The topics
are related, because they both deal with the situation where the work
tree has files which are not normalized according to gitattributes. But
my patch is more about saying: ok, I know I may have files which need to
be normalized, but I want to ignore this issue for now. Please disable
gitattributes for now, because I want to work with the files as they are
committed. Conversely, the discussion here is about how to reliably
detect and fix files which are not normalized.
git ls-files --eol can detect that (as Junio pointed out)

I primarily wanted to make sure that you understood the underlying
issue, so that I do not have to go back to the basics in the other
thread.  And it is clear that you obviously do, which is good.

Here, you seem to think that what t0025 wants to see happen is
sensible, judging by the fact that you call "rm .git/index && git
reset" a "fix".

My take on this is quite different.  After a "reset --hard HEAD", we
should be able to trust the cached stat information and have "diff
HEAD" say "no changes".  That is what you essentially want in the
other thread, if I understand you correctly, and in an ideal world
where the filesystem timestamp has infinite precision, that is what
would happen in t0025, always "breaking" its expectation.  The real
world has much coarser timestamp granularity than ideal, and that is
why the test appear to be "flaky", failing to give "correct" outcome
some of the time--but I'd say that it is expecting a wrong thing.

An index entry that has data that does not round-trip when it goes
through convert_to_working_tree() and then convert_to_git() "breaks"
this arrangement, and I'd view it as the user having an inconsistent
data.  It is like you are in a repository that still has an unmerged
paths--you cannot proceed before you resolve them.
This is actually bringing some light to me: the round-trip test.
There are this "well known but less well document" situations where we break that rule:
- files are checked in with CRLF into the repo.
- .gittatributes is set to "text" later.
2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with mixed line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

- files with CRCRLF line endings in the repo:
Same here: 2 different ways to handle it:
- keep the eol at checkout, normalize at checkin -> roundtrip broken
- keep the eol at checkout and checkin -> roundtrip OK

My feeling is that we should simply say:
You user set attribute to "text" and by doing that, you promised to have files
with LF only in the index.
If you break that promise, Git does  not know, what you really want.
- It may be a situation where you write a shell script which for some reasons needs a '\015' at the end of a line, and Git may treat it wrong by assuming
  that this is a CRLF line ending (end converts it into LF)
- It may be that you want CRLF because you added a Windows .BAT file.
It may be that you use git.git and another implementation of Git, which doesn't support attributes at all, so that a save way to do this is to just commit CRLF.
- It may be that this is a historical issue.
  Everybody using the project uses git that understands .gitattributes,
  so someone may fix it some day.

Can Git make this decision ?

When core.autocrlf is true (and no attributes are set), then the conversion of line endings is disabled.
See convert.v "This is the new safer autocrlf handling",
commit fd6cce9e

So the round trip is achieved when core.autocrlf=true,
but no longer when attributes are added.
[]


Anyway.

As to your patch in the other thread, here is what I think:

  (1) When you know (or perhaps your CI knows) that the working tree
      has never been modified since you did "reset --hard HEAD" (or
      its equivalent, like "git checkout $branch" from a clean
      state), these paths with inconsistent data would break the
      usual check to ask "is the working tree clean?"  That is a
      problem and we need a way to ensure that the working tree is
      always judged to be clean immediately after "reset --hard
      HEAD".  IOW, I agree with you that the issue you are trying to
      solve is worth solving.

  (2) Regardless of the "inconsistent data breaking the cleanliness
      check" issue, it may be handy to have a way to temporarily
      disable the attributes, i.e. allow us to ask "what happens if
      there is no attributes defined?"  IOW, I am saying that the
      change in the patch is not without merit.

In addition to (1), I further think that this sequence should not
report that the path F is modified:

      # Write F from HEAD to the working tree, after passing it
      # through convert_to_working_tree()
      $ git reset --hard HEAD

      # Force the re-reading, without changing the contents at all
      $ cp F F.new
      $ mv F.new F

      $ git diff HEAD

which is broken by paths with inconsistent data.  Your CI would want
a way to make that happen.

However, I do not think disabling attributes (i.e. (2)) is a
solution to the issue (i.e. (1)), which we just agreed to be an
issue that is worth solving, for at least two reasons.

  * Even without any attributes, core.autocrlf setting can get the
    data in your index (whose lines can be terminated with CRLF) into
    the same "inconsistent data" situation.  Disabling attribute
    handling would not have any effect on that codepath, I think.

I don't think so, see above.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]