RE: is this data corruption?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On December 30, 2022 7:18 PM, Samuel Wales wrote:
>i am not subscribed, but am of the impression that's ok.  please copy me directly.
>
>
>tldr: git diff is showing differences that do not exist in the files themselves.
>
>i have nothing staged, nothing fancy like stashing, etc.  this is a repo of mostly
>emacs org mode files.  mostly ascii text.
>
>git status and these commands show nothing unusual:
>
>    git fsck --strict --no-dangling
>    git gc --prune="0 days"
>
>
>the problem that seems like data corruption is that a few lines appear twice as -
>and once as +.  but in the current version of the files, those lines exist only once.
>here are the lines.  there are 2 - versions and one + version:
>
>+***************** REF bigpart is a partition biglike and homelike are
>+distracting nonsense i think except to describe inferior filesets.
>+anomalous subset of home might be called homelike or so.
>
>
>emacs magit shows the same problem.  however, it shows a slightly different diff.
>i did a meta-diff on git diff vs. magit, and there are about 800 + real-content lines
>that magit shows but git diff does not.  i do not know what this means.  wc -l is like
>
>  62540 aaa.diff
>  62965 bbb--magit.txt
>
>idk why a diff would be different with only + lines being different?
>
>
>in summary, what is wrong with my repo, if anything, and what can i do about it?
>nothing on the web for git corruption seems to say much, other than pull from
>github or whatever.  this is my own repo, the original repo, so i cannot do that.  org
>annex has an uncorrupt tool of some kind, but it did not seem relevant.  i do have
>rsnapshot [basically rsync] backups of the repo and the most significant files and
>dirs, but i do not know what one does to use that to repair any issues.  i won't get
>into why, but changes were made over months.
>
>is there a protocol for this?
>
>would git fsck have balked?
>
>thank you!
>
>
>p.s.
>
>i have no reason to believe this is related, but git diff has intermingled emacs org
>mode entries.  but i don't have to talk about it in org terms; in generic text terms, it
>has intermingled parts of different paragraphs.  as a user, i'd prefer that
>completely unrelated paragraphs not be mingled, regardless of minimality.  if
>possible.
>
>with respect to the intermingling only, unless this is related to the possible
>corruption, i will presume the diff is correct, in that a patch from it would produce
>the same result as a patch that does not intermingle.  i believe this intermingling is
>because diff does not understand org, or paragraphs for that matter.  in org, an
>entry starts with "^[*]+ " and ends at the beginning of another entry or at eof.
>they consist in my case mostly of ascii text paragraphs.  just as with paragraphs, if
>you move an entry, you don't expect it to be mingled with a different one in the
>diff.
>
>i have been told that this cannot be fixed by merely telling a slightly improved
>differ that stuff between stars is worth preserving, but that a parser, not merely a
>couple of regexps, is needed to reduce this intermingling.  i have also been told
>that difftastic uses tree-sitter, which might get such a syntax for emacs org mode.
>and so maybe at some point git diff can use that.  idk.
>
>idk if any of this is related but i include it for completeness.
>
>also, please don't laugh, but i am using git version 2.11.0.  i will upgrade pending
>various library and os stuff but my main concern is not for git, but for possible
>corruption in the repo and what is possible to do, at least given rsnapshot, to fix it.

I cannot account for your emacs issues, but have a question. Have you done any git add operations? Git diff is subject to what specifically is staged. So you might be comparing your file with partially staged content that could account for partial diffs. See if diff --cached makes a difference. Also try different algorithms, like --patience or --diff-algorithm=histogram.

--Randall




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux