Migration to Git LFS inflates repository multiple times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm posting here for the first time and I hope it's the right place to ask
questions about Git LFS.

TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
and how to deal with it?

I'm migrating a big SVN repository to Git.
In SVN, a collection of third-party SDKs is maintained along with codebase.
Many of the third-party libraries come in binary form.
So, I'm migrating binary files of those to Git LFS.

I'm following the Git LFS tutorial,
section "Migrating existing repository data to LFS"
https://github.com/git-lfs/git-lfs/wiki/Tutorial

First, I run initial translation of the SVN reoi into Git..
The new repository is a Git bare repository.
There are 5 branches and 10+ tags in the proj.git repo.

It is quite large:

proj.git (BARE:master) $ du -sh
19G

Next, I performed the following sequence of steps to optimise it
and migrate to Git LFS:

1. Optimise the repo

proj.git (BARE:master) $ git gc
Enumerating objects: 1432599, done.
Counting objects: 100% (1432599/1432599), done.
Delta compression using up to 48 threads
Compressing objects: 100% (864524/864524), done.
Writing objects: 100% (1432599/1432599), done.
Total 1432599 (delta 541698), reused 1405922 (delta 525738)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 1432599, done.

proj.git (BARE:master) $ du -sh
11G

2. List the file types taking up the most space in the repo

proj.git (BARE:master) $ git lfs migrate info --everything
migrate: Sorting commits: ..., done
migrate: Examining commits: 100% (29412/29412), done
*.lib   27 GB       3524/3524 files(s)  100%
*.pdb   5.6 GB      1412/1412 files(s)  100%
*.cpp   4.8 GB  131848/131854 files(s)  100%
*.exe   2.3 GB        798/798 files(s)  100%
*.dll   2.0 GB      1000/1000 files(s)  100%

3. Migrate the repo to Git LFS

proj.git (BARE:master) $ git lfs migrate import
--include="*.exe,*.dll,*.lib,*.pdb,*.zip" --everything

4. Check size of the repo after migration to Git LFS

proj.git (BARE:master) $ du -sh
47G

5. Cleaning up the `.git` directory after migration to Git LFS

proj.git (BARE:master) $ git reflog expire --expire-unreachable=now --all

proj.git (BARE:master) $ git gc --prune=now --aggressive
Enumerating objects: 1462310, done.
Counting objects: 100% (1462310/1462310), done.
Delta compression using up to 48 threads
Compressing objects: 100% (1422322/1422322), done.
Writing objects: 100% (1462310/1462310), done.
Total 1462310 (delta 577640), reused 845097 (delta 0)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 1462310, done.

6. Check final disk size of the repo

proj.git (BARE:master) $ du -sh
39G

7. List the file types taking up the most space in the repository
after migration to Git LFS

proj.git (BARE:master) $ git lfs migrate info --everything
migrate: Sorting commits: ..., done
migrate: Examining commits: 100% (29412/29412), done
*.cpp   4.8 GB  131848/131854 files(s)  100%
*.png   1.1 GB  696499/696499 files(s)  100%
*.h     828 MB    86386/86471 files(s)  100%
*.csv   820 MB        939/939 files(s)  100%
*.html  686 MB    34126/34126 files(s)  100%


Now, I'm looking for anaswers to the following questions:

1. Is the procedure presented above correct to migrate (SVN ->) Git -> Git LFS?

2. Given the initial translation to Git generated 19 GB repo
(optimised to 11 GB)
is this normal Git LFS migration inflates the repository
to 47 GB (optimised ot 39 GB)?

3. Why the inflation happens? Is this a function of number of branches?
   How to understand the jump from 11 GB to 39 GB?

4. How to optimise the repository to cut the size down further?

My next step is to somehow push the fat pig into GitHub, Bitbucket or
Azure DevOps ;-)

I've used Git for a few years, but I'm pretty newbie regarding low-level
or administration tasks, so I might have made basic errors.
I'll be thankful for any feedback.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux