Re: help needed: Splitting a git repository after subversion migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas Jarosch venit, vidit, dixit 07.12.2008 18:41:
> Hello together,
> 
> I've successfully imported a large subversion repository into git.
> The tree contains source code and binary data ("releases"),
> the resulting .git directory is about 11GB.
> 
> After the import I recreated the tags/branches by converting the refs
> to the subversion tags using a small shell script from the web:
> 
> for branch in `git branch -r`; do
>      ...
>      version=`basename $branch`
>      git tag -s -f -m "$subject" "$version" "$branch^"
>      git branch -d -r $branch
> done
> 
> Ok, so far everything went really smooth. I wanted to split this repository
> into two repositories, one for the source code and one for the binary data.
> The current tree layout is like this:
> 
> sources/c++_xyz
> releases/large_binary_data
> ...
> 
> The original tree was imported from CVS to subversion and the layout
> of the trunk was once reorganized/moved later. Here's the command
> I used to split out the "source" tree:
> 
> git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r -f
> CVSROOT Attic source/Attic develpkg/Attic
> source/packages/Attic releases update_pkg' -- --all
> 
> After that I ran these commands to reclaim the space:
> - git clone --no-hardlinks filtered_tree final_output
> - cd final_output
> - git gc
> - git prune
> - git repack -a -d --depth=250 --window=250
> 
> Unfortunately the .git directory of the "source" tree is still 7.5GB big.
> 
> When I just imported the "trunk" from subversion without any tags
> and then ran "git filter-branch --subdirectory-filter source" + git gc,
> the .git directory was about 1.5GB afterwards.
> 
> How can I find out where those other 6GB go to?
> I already looked at the tags with gitk,
> there's no sign of the releases/* stuff left.

I strongly suspect the reorganization/move to be the cause. Most
probably some releases were put in places where you don't expect them,
and therefore they are not filtered out by removing the releases subdir.
If they have distinguished file names (say you know a name from before
the move) you can find them using "git log". Or use gitk --all, switch
to "tree display" and look for unexpected files in the earliest revisions.

Also, it may be better to do the tag creation (from tags/... branches)
after the filter-branch. If you don't rewrite the tags (have you?) then
the tags will still point to the original commits (before the rewrite)
and therefore include all the "fat blobs". You avoid this best by
creating them after the rewrite.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux