On Thu, Nov 12, 2015 at 02:02:10PM +0300, Konstantin Khomoutov wrote: > A user recently asked an interesting question on the git-users list. > I think it warrants attentions of a specialists more hard-core than > we're there over at git-users. > > So I'd like to solicit help if those knowledgeable, if possible. Thanks. Curating user questions and forwarding the hard ones here is appreciated. > I have a repo that is giving a 'git fsck --full' error that seems to be > different from the existing questions and answers on stackoverflow on > this topic. For example, in our fsck error it is not obvious which > file is actually duplicated and how/where. And there is no commit sha > involved - apparently only blob and tree sha's. But then finding good > documentation on this is challenging. Yes, fsck does not traverse the graph in order. So it sees a problem with a particular tree, but cannot know where that tree is within the whole project tree, or which commits reference it. In fact, an arbitrary number of commits might reference it. The most useful thing is sometimes to ask which commit introduced the tree (which can _also_ have multiple answers, but usually just one). You can do that by walking the history, like this: tree=df79068051fa8702eae7e91635cca7eee1339002 git log --all --format=raw --raw -t --no-abbrev | less +/$tree That will visit each commit. The options are: - we visit commits reachable from all branches and tags (--all) - we include the sha1 of the root tree (due to --format=raw) - adding --raw shows the raw diff, which includes the sha1 of each file touched by the commit - using "-t" includes the raw diff for trees, rather than just blobs - using "--no-abbrev" gives full 40-hex sha1s And then "less +/$tree" will open the pager and immediately jump to the first instance of the sha1 in question. But of course that doesn't tell you how to fix it. It might tell you how the bogus object came about (and it is a bogus object; a bug-free git implementation should _never_ produce a tree with duplicate entries. AFAIK we have never had such a bug in Git itself, but I have occasionally come across problematic entries that I suspect were created with very old versions of JGit). > error in tree df79068051fa8702eae7e91635cca7eee1339002: contains > duplicate file entries > [...] > $ git ls-tree df79068051fa8702eae7e91635cca7eee1339002 > > 100644 blob 14d6d1a6a2f4a7db4e410583c2893d24cb587766 build.gradle > > 120000 blob cd70e37500a35663957cf60f011f81703be5d032 msrc > > 040000 tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9 msrc > > 100644 blob f623819c94a08252298220871ac0ba1118372e59 pom.xml > > 100644 blob 9223cc2fddb138f691312c1ea2656b9dc17612d2 settings.gradle > > 040000 tree c3bac1d92722bdee9588a27747b164baa275201f src Looks like "msrc" is your duplicate entry (even though the sha1s of the sub-entries are different, the tree cannot have two entries with the same name). You can use the "log" trick above to find the full path to it. The fact that one is a symlink (mode 120000) and one is a tree means that whatever git implementation created this presumably has a bug related to symlinks. The only way to fix it is to rewrite the history mentioning the tree (because once the tree is fixed, it will get a new sha1, and then any commit referencing it will get a new sha1, and commits built on that, and so forth). You can use "git filter-branch" to do so. There is a sample command here: http://stackoverflow.com/questions/32577974/duplicate-file-error-while-pushing-mirror-into-git-repository/ that just rewrites each tree via a round-trip to the index (so it's not clear which of the duplicate entries it will discard). You could also write a more clever index-filter snippet to use git-update-index to insert the entry you want. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html