Re: Reducing Git Repository size - git-filter-repo doesn't help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 8, 2023 at 6:54 PM fawaz ahmed0 <fawazahmed0@xxxxxxxxxxx> wrote:
>
> Hi,
>
> I have this huge repo: https://github.com/fawazahmed0/currency-api#readme  and I am trying to reduce its size.
>
> I have run filter-repo script on this repo (  https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/cleanup-repo.yml )

Why are you cleaning up in a CI task?  filter-repo is intended for a
one-shot "flag day" type cleanup, not something you repeatedly do.
Something seems a bit off already.

> The commits were reduced from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB , https://github.com/fawazahmed95/currency-api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )

You show the ending size, but not the starting.  Could you provide
that number and how you got it, so we can see what you're measuring
(especially since below it's not at all clear what you're even
measuring?)

The reduction in commits suggests it certainly did do some kind of
pruning, and you might also want to look at the output of running
"python3 git-filter-repo --analyze", both before and after filtering,
to get an idea of what's is/was using lots of space.

Taking a closer look, I suspect you are missing some important
cleaning.  When there are multiple copies of a file in a repository,
git only stores one version.  Based on
https://github.com/fawazahmed0/currency-api/issues/55, all the files
that are now in directories that you are deleting used to be in the
root folder under another name.  The files with the old name aren't
going to be deleted by your pruning since you only requested that the
new names of the files be deleted.  If I'm understanding your
structure correctly (I didn't clone your repo or try this out; I'm
making inferences based on poking around at the links you provided and
looking at that issue), the upshot of that is that your filtering
probably won't shrink things much since you are still keeping a copy
of those files.  Again, "python3 git-filter-repo --analyze" both
before and after filtering will help you find these kinds of things
and/or other problems.

> Almost all commits of this repo were applied on partially cloned repository: ( https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/run.yml )
> So I guess it had never run any git maintenance task in it's life.

How exactly are you measuring the size, given that you have a partial
clone?  You don't even have the objects in order to measure, so I
don't understand how you are measuring.  I'm even suspecting you are
measuring something else entirely; could you clarify all your size
measurements and how you got them?

> I am not sure what needs to be done to reduce it's space utilization. ( https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended )

Note that git-filter-repo only changes the size of the _local_ repo.
You made an additional clone within GitHub Actions, and then
filter-repo shrinks *that* clone.  Even if you had deleted all copies
of older files you don't want anymore locally (which is suspect as I
noted above), your force pushing isn't going to shrink the size of the
repo on the server (i.e. on GitHub) since there are pull requests in
your repo that GitHub won't allow you to overwrite via force-push, and
those pull requests still hold on to the old history.

You probably want to read the "DISCUSSION" section of the filter-repo
manual, and you may also want to see GitHub's documentation on
shrinking repos, up at
https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository.
It appears you've skipped the whole "Fully removing the data from
GitHub" section of their documentation.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux