Re: Fw: [git-users] git fsck error - duplicate file entries - different then existing stackoverflow scenarios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 12, 2015 at 02:02:10PM +0300, Konstantin Khomoutov wrote:

> A user recently asked an interesting question on the git-users list.
> I think it warrants attentions of a specialists more hard-core than
> we're there over at git-users.
> 
> So I'd like to solicit help if those knowledgeable, if possible.

Thanks. Curating user questions and forwarding the hard ones here is
appreciated.

> I have a repo that is giving a 'git fsck --full' error that seems to be 
> different from the existing questions and answers on stackoverflow on
> this topic.  For example, in our fsck error it is not obvious which
> file is actually duplicated and how/where.  And there is no commit sha
> involved - apparently only blob and tree sha's.  But then finding good
> documentation on this is challenging.

Yes, fsck does not traverse the graph in order. So it sees a problem
with a particular tree, but cannot know where that tree is within the
whole project tree, or which commits reference it. In fact, an arbitrary
number of commits might reference it.

The most useful thing is sometimes to ask which commit introduced the
tree (which can _also_ have multiple answers, but usually just one). You
can do that by walking the history, like this:

  tree=df79068051fa8702eae7e91635cca7eee1339002
  git log --all --format=raw --raw -t --no-abbrev | less +/$tree

That will visit each commit. The options are:

  - we visit commits reachable from all branches and tags (--all)

  - we include the sha1 of the root tree (due to --format=raw)

  - adding --raw shows the raw diff, which includes the sha1 of each
    file touched by the commit

  - using "-t" includes the raw diff for trees, rather than just blobs

  - using "--no-abbrev" gives full 40-hex sha1s

And then "less +/$tree" will open the pager and immediately jump to the
first instance of the sha1 in question.

But of course that doesn't tell you how to fix it. It might tell you how
the bogus object came about (and it is a bogus object; a bug-free git
implementation should _never_ produce a tree with duplicate entries.
AFAIK we have never had such a bug in Git itself, but I have
occasionally come across problematic entries that I suspect were created
with very old versions of JGit).

> error in tree df79068051fa8702eae7e91635cca7eee1339002: contains
> duplicate file entries
> [...]
> $ git ls-tree df79068051fa8702eae7e91635cca7eee1339002
> 
> 100644 blob 14d6d1a6a2f4a7db4e410583c2893d24cb587766 build.gradle
> 
> 120000 blob cd70e37500a35663957cf60f011f81703be5d032 msrc
> 
> 040000 tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9 msrc
> 
> 100644 blob f623819c94a08252298220871ac0ba1118372e59 pom.xml
> 
> 100644 blob 9223cc2fddb138f691312c1ea2656b9dc17612d2 settings.gradle
> 
> 040000 tree c3bac1d92722bdee9588a27747b164baa275201f src

Looks like "msrc" is your duplicate entry (even though the sha1s of the
sub-entries are different, the tree cannot have two entries with the
same name). You can use the "log" trick above to find the full path to it.

The fact that one is a symlink (mode 120000) and one is a tree means
that whatever git implementation created this presumably has a bug
related to symlinks.

The only way to fix it is to rewrite the history mentioning the tree
(because once the tree is fixed, it will get a new sha1, and then any
commit referencing it will get a new sha1, and commits built on that,
and so forth).

You can use "git filter-branch" to do so. There is a sample command
here:

  http://stackoverflow.com/questions/32577974/duplicate-file-error-while-pushing-mirror-into-git-repository/

that just rewrites each tree via a round-trip to the index (so it's not
clear which of the duplicate entries it will discard). You could also
write a more clever index-filter snippet to use git-update-index to
insert the entry you want.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]