j.sixt@xxxxxxxxxxxxx wrote on Fri, 29 Jul 2011 17:49 +0200: > Am 7/29/2011 17:31, schrieb Pete Wyckoff: > > I'm trying to use "git replace" to avoid cloning the entire set > > of duplicate commits across a slow inter-site link. Like this: > > > > ...---A----B----C site1/top > > \ > > D---E---F site1/proj > > > > ...---A'---B'---C' site2/top > > > > It is true that "git diff C C'" is empty: they are identical. > ... > > I thought maybe I could "git fetch --depth=N" where N would cover > > the range A'..site2/top, then replace. But testing with "git > > fetch --depth=3" still wants to fetch 100k objects. > > On site2, don't you want to 'git fetch --depth=N site1' such that F down > to at least C (but not much more) is fetched, and then apply the graft or > replacement on site2? Yes, that makes sense, shallow clone needs to pull the entire tree. On site1 (bare .git repo): $ du -sm . 542 . $ git merge-base site1/proj site1/top ff016f956ccae7878a1b322ba950a0088c6e2ded ;# this is A $ git rev-list ff016f956ccae7878a1b322ba950a0088c6e2ded | wc 566 566 23206 On site2: $ du -sm .git 649 .git $ git rev-parse :/1384557 0f95d91c37bc870d610b7bd45b316ab219750d31 ;# this is A' $ git rev-list 0f95d91c37bc870d610b7bd45b316ab219750d31 | wc 566 566 23206 Same number of commits all the way back to the beginning of time, but the timestamp in the root commit is different, so all the SHA1s are different. On site2: $ time git fetch git://site1/repo warning: no common commits remote: Counting objects: 124166, done. remote: Compressing objects: 100% (64472/64472), done. remote: Total 124166 (delta 59815), reused 121350 (delta 57062) Receiving objects: 100% (124166/124166), 462.31 MiB | 5.31 MiB/s, done. Resolving deltas: 100% (59815/59815), done. From git://site1/repo * branch HEAD -> FETCH_HEAD 0m56.25s user 0m5.18s sys 2m29.45s elapsed 41.11 %CPU A brand new repo on site2, cloning this time with a teensy depth: $ time git fetch --depth=3 git://site1/repo warning: no common commits remote: Counting objects: 96440, done. remote: Compressing objects: 100% (58844/58844), done. remote: Total 96440 (delta 36454), reused 92650 (delta 35169) Receiving objects: 100% (96440/96440), 415.87 MiB | 7.38 MiB/s, done. Resolving deltas: 100% (36454/36454), done. From git://site1/repo * branch HEAD -> FETCH_HEAD 0m40.40s user 0m5.27s sys 1m50.29s elapsed 41.41 %CPU No savings in data transport. Was hoping it would be possible to get just the changes, but walking back to FETCH_HEAD~3 shows that it imports all the files. That makes sense given the use case for shallow clone. But I want to tell the fetch machinery that I already have one of the commits it is going to see. I'll just tell people to put up with the full copy, and try to fix things so that only one site creates the git repo from p4 in the future. Thanks for looking, -- Pete -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html