Henrik Grubbström <grubba@xxxxxxxxx> writes: >> 3. "git update-index --refresh" does not improve the situation, as it >> (thinks) it knows the blob and the working tree file are different. > > False. "git update-index --refresh" uses > read-cache.c:ce_compare_data() to compare the content of the blob with > the normalized content of the working tree,... I don't think you tried it yourself. Here is what should happen with the current code. # step 0 & 1. a project with LF ending $ git init two && cd two $ echo a quick brown fox >kuzu $ git add kuzu && git commit -m kuzu # step 2. you want CRLF in your work area $ echo -e "a quick brown fox\015" >kuzu $ git config core.autocrlf true $ git diff diff --git a/kuzu b/kuzu # step 3. oops, refresh $ git update-index --refresh kuzu: needs update And it is a common thing people wish to do. Admittedly, this is an one-off event, so it is not _that_ urgent to fix. You can for example do: # step 4. you haven't changed anything really, so you can add $ git add kuzu $ git diff $ git diff --cached ;# empty! to force re-index. > Let's take the reverse case instead: > > 0. For some reason a file using CRLF line endings has entered the > repository. # step 0. a blob with CRLF $ git init one && cd one $ echo -e "a quick brown fox\015" >kuzu $ git add kuzu && git commit -m kuzu > 1. The user notices the mistake, and sets crlf. The index is still > clean, but the user wants the file with LF line endings, so the > user does a "git checkout -- the_file". # step 1. you want CRLF in work area, LF in repository $ git config core.autocrlf true $ git diff ;# clean! $ git checkout -- kuzu $ git diff ;# clean! $ cat -v kuzu a quick brown fox^M One glitch is that this "checkout" becomes a no-op because the file is stat-clean. This is something your "record in the index entry what normalization was used when we checked it out (or when we read and hashed)" approach should be able to fix. It however does not need the re-indexed object name. Side note: if you want to have LF in both work tree and in repository, then you wouldn't use core.autocrlf. Instead you would do: # step 1 (irrelevant alternative). you want LF in both $ dos2unix kuzu $ git diff ;# clean! > 2. The index is now dirty, so the user performs a "git update-index > --refresh". I think you see exactly the same behaviour in the example sequence I gave you (blobs with LF with working files with CRLF, core.autocrlf set) and in your example sequence (blobs with CRLF with working files with LF, core.autocrlf set) in this case. What happens to my example are already shown above. Continuing your example, because in reality the index is not dirty, we would need to make it stat-dirty first. # step 2. you try to refresh $ touch kuzu $ git update-index --refresh kuzu: needs update $ git checkout -- kuzu $ cat -v kuzu a quick brown fox^M $ git diff ;# shows changes! diff --git a/kuzu b/kuzu index .... If you are trying to somehow make this last "git diff" silent, then I think you are solving a _wrong_ problem. By setting retroactively the CRLF setting, you are saying that you do not want to have CRLF in the blobs recorded in the repository, and it is a _good thing_ that there are differences (tons of them) between what is recorded currently and what you are going to commit to fix the earlier mistake. >> I was hoping to see a solution where you will add a stronger version of >> "refresh" without having to do anything else other than recording "how did >> I munge the file in the working tree to produce the blob". The third step >> would change to: >> >> 3. "git update-index --refresh" notices that the conversion parameters >> are different since the last time the files in the working tree were >> looked at (i.e. immediately after a "clone", working tree files are >> what git wrote out using convert_to_working_tree() and you know what >> conversion you used; after the user modified files in the working tree >> and said "git add", you know you what conversion parameters you ran >> convert_to_git() with to produce blobs). The paths that has different >> conversion parameters are re-indexed to see if they hash to the same >> sha1 as recorded in the index. If they have changed, their index >> entries are left intact (i.e. you will still show the differences); >> otherwise you update the cached stat information for their index >> entries. > > I believe that most people that have edited a file that has changed > CRLF convention aren't interested in that all lines have changed, but > in only the lines that have actually been edited. I think that is not just solving a wrong problem but also is encouraging a bad workflow at the same time, which is even worse. If you recorded CRLF blobs in a project that you do not want to have CRLF blob, fixing that mistake is one logical change that people who later browse the history would not want to see intermixed with the "lines that actually have been edited". > Storing the normalized sha1 is needed to reduce the amount of double > work (eg having "git update-index --refresh" reperform > convert_to_git() for > the repository blob every time a file is dirty, instead of (as now) > just comparing the sha1 values). As I already said, I agree that it would be beneficial to store what normalization settings were used and comparing that with what settings are in effect to detect the possible phamtom difference caused by the change of the settings. But once we know that the result of a re-normalization is different from what is recorded in the index (or tree), then the difference should be shown. The actual difference would change every time the work tree file is edited, so I don't see the benefit of contaminate the object database with intermediate "blobs" that is not "added". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html