On Sun, Jan 8, 2023 at 6:54 PM fawaz ahmed0 <fawazahmed0@xxxxxxxxxxx> wrote: > > Hi, > > I have this huge repo: https://github.com/fawazahmed0/currency-api#readme and I am trying to reduce its size. > > I have run filter-repo script on this repo ( https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/cleanup-repo.yml ) Why are you cleaning up in a CI task? filter-repo is intended for a one-shot "flag day" type cleanup, not something you repeatedly do. Something seems a bit off already. > The commits were reduced from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB , https://github.com/fawazahmed95/currency-api/actions/runs/3865919157/jobs/6589710845#step:5:1498 ) You show the ending size, but not the starting. Could you provide that number and how you got it, so we can see what you're measuring (especially since below it's not at all clear what you're even measuring?) The reduction in commits suggests it certainly did do some kind of pruning, and you might also want to look at the output of running "python3 git-filter-repo --analyze", both before and after filtering, to get an idea of what's is/was using lots of space. Taking a closer look, I suspect you are missing some important cleaning. When there are multiple copies of a file in a repository, git only stores one version. Based on https://github.com/fawazahmed0/currency-api/issues/55, all the files that are now in directories that you are deleting used to be in the root folder under another name. The files with the old name aren't going to be deleted by your pruning since you only requested that the new names of the files be deleted. If I'm understanding your structure correctly (I didn't clone your repo or try this out; I'm making inferences based on poking around at the links you provided and looking at that issue), the upshot of that is that your filtering probably won't shrink things much since you are still keeping a copy of those files. Again, "python3 git-filter-repo --analyze" both before and after filtering will help you find these kinds of things and/or other problems. > Almost all commits of this repo were applied on partially cloned repository: ( https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/run.yml ) > So I guess it had never run any git maintenance task in it's life. How exactly are you measuring the size, given that you have a partial clone? You don't even have the objects in order to measure, so I don't understand how you are measuring. I'm even suspecting you are measuring something else entirely; could you clarify all your size measurements and how you got them? > I am not sure what needs to be done to reduce it's space utilization. ( https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended ) Note that git-filter-repo only changes the size of the _local_ repo. You made an additional clone within GitHub Actions, and then filter-repo shrinks *that* clone. Even if you had deleted all copies of older files you don't want anymore locally (which is suspect as I noted above), your force pushing isn't going to shrink the size of the repo on the server (i.e. on GitHub) since there are pull requests in your repo that GitHub won't allow you to overwrite via force-push, and those pull requests still hold on to the old history. You probably want to read the "DISCUSSION" section of the filter-repo manual, and you may also want to see GitHub's documentation on shrinking repos, up at https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository. It appears you've skipped the whole "Fully removing the data from GitHub" section of their documentation.