Hi all, I tried out BFG tool on my github fork of glusterdocs project. $ git count-objects -vH | grep size-pack size-pack: 62.88 MiB $ cd .. $ java -jar bfg-1.12.12.jar --delete-files '*.{odp,pdf}' glusterdocs.git $ cd glusterdocs.git $ git reflog expire --expire=now --all && git gc --prune=now --aggressive $ git count-objects -vH | grep size-pack size-pack: 2.52 MiB As seen above, the repo size was reduced from around 62MB to 2.5MB. Caveat: If we do go with this approach, as git history is re-written, every contributor will have to re-fork the "cleaned" repo. There are about 60 forks now on github. Consequently, anyone sending a PR will now have to create a fresh clone. Is this "reset" worth it given the slight confusion and one-time inconvenience to contributors ? Thoughts ? Regards, -Prashanth Pai ----- Original Message ----- > From: "Amye Scavarda" <amye@xxxxxxxxxx> > To: "Nigel Babu" <nigelb@xxxxxxxxxx> > Cc: "Humble Chirammal" <hchiramm@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Tuesday, May 17, 2016 6:14:23 PM > Subject: Re: Reducing the size of the glusterdocs git repository > > > > On Tue, May 17, 2016 at 6:02 PM, Nigel Babu < nigelb@xxxxxxxxxx > wrote: > > > > We could potentially setup travis-ci to do builds that'll fail loudly if we > commit something that throws a warning. I've tried out the possibility here: > > https://travis-ci.org/nigelbabu/glusterdocs/jobs/130816121 > > I've purposefully made it fail. Success looks like this: > > https://travis-ci.org/nigelbabu/glusterdocs/jobs/130815368 > > We can, in the future, add stuff so that documentation has working links and > there are no large files checked in. If there's interest happy to send a > pull request for this. > > I like this a lot. > It's a way to make sure w'ere not putting in things that haven't been > thoroughly checked. > > PR welcome. > > - amye > > > > > On Tue, May 17, 2016 at 4:55 PM, Amye Scavarda < amye@xxxxxxxxxx > wrote: > > > > > On Tue, May 17, 2016 at 3:59 PM, Amye Scavarda < amye@xxxxxxxxxx > wrote: > > > > > > On Tue, May 17, 2016 at 3:56 PM, Niels de Vos < ndevos@xxxxxxxxxx > wrote: > > > On Tue, May 17, 2016 at 02:42:27PM +0530, Amye Scavarda wrote: > > Hi all, > > > > So we have a new slideshare.net account, GlusterCommunity ( > > http://www.slideshare.net/GlusterCommunity/ ) that connects with the > > Gluster.org G+ community - and it'll even connect with the YouTube channel! > > > > I've submitted a PR to the glusterdocs repo that will need some review: it > > removes all of the presentations from the repo and links to slideshare. ( > > https://github.com/gluster/glusterdocs/pull/109 ) > > Cool, but note that the size of the repository does not decrease with > that commit. The git repository will still contain all the presentations > in the history/log. But not adding any more presentations is a good step > already :-) > > You are correct, but it will not make the current issue worse. It would help > if I actually hit 'reply all'. > > > > In no way does this mean that anyone needs to use Slideshare to host PDFs > > of slides, you can use whatever you want. I chose slideshare because there > > was an older Gluster account that had some Gluster.com presentations and it > > links with YouTube. > > > > Thoughts? > > Looks good to me, but maybe you can address this comment in the GitHub > pull request: > https://github.com/gluster/glusterdocs/pull/109/files#r63498585 > > That's why I have you all to proofread. > > > One thing I'm noticing, we don't have any sort of CI on Read The Docs. Let me > see if there's not an easy way to fix that and have TravisCI tell us if > we're about to merge something with a bunch of borked links. > -- a > > > > - amye > > > Thanks, > Niels > > > - amye > > > > > > > > On Thu, May 12, 2016 at 7:49 PM, Niels de Vos < ndevos@xxxxxxxxxx > wrote: > > > > > On Thu, May 12, 2016 at 03:55:23PM +0530, Kaushal M wrote: > > > > On Thu, May 12, 2016 at 1:25 PM, Niels de Vos < ndevos@xxxxxxxxxx > > > > > wrote: > > > > > On Thu, May 12, 2016 at 02:56:52AM -0400, Prashanth Pai wrote: > > > > >> > > > > >> > > > > >> > > Right now, even cloning the main docs branch is a huge pain due > > > to the size > > > > >> > > of the repo. > > > > >> > > I think that branching will solve not this problem, and might > > > make the > > > > >> > > problem worse. > > > > >> > > > > > >> > Branching would not increase the size of the repository itself. > > > Only the > > > > >> > size used on RTD will be bigger as the HTML for different branches > > > will > > > > >> > be generated (so contents is there 2x). Cloning the repository is > > > not > > > > >> > affected. > > > > >> > > > > > >> > Deleting files (like the presentations) will also not remove them > > > from > > > > >> > the git repository. It will stay possible to checkout an older > > > version > > > > >> > of the docs from the same repository, all of the history is > > > downloaded > > > > >> > once the repository is cloned. > > > > >> > > > > > >> > In order to reduce the size of the repository, you need to create > > > > >> > a > > > new > > > > >> > one, and import the changes without the big files. While importing > > > > >> > changes from an other (the current) repository, it is possible to > > > modify > > > > >> > the changes on the fly and prevent importing the big files. This > > > keeps > > > > >> > the history and the credits for the contributors. > > > > >> > > > > >> This is an alternative solution: > > > > >> https://rtyley.github.io/bfg-repo-cleaner/ > > > > > > > > > > Right, I was thinking about git-filter-branch. In the end, I am > > > > > pretty > > > > > sure that the old/original repository is not valid anymore. I expect > > > > > that 'git rebase' is used for the cleaning, and that will change the > > > > > commit-ids of patches that follow after a 'cleaned' patch. > > > > > > > > > > Mu recommendation for a seperate repository, is only for preventing > > > > > inconsistencies between the upstream repository (after cleaning) and > > > the > > > > > previously cloned/forked repositories that contributors have. > > > > > > > > > >> > Where would you suggest the presentations (and other files?) > > > > >> > should > > > get > > > > >> > located? > > > > >> > > > > >> May be an official Gluster community slideshare or speakerdeck > > > account ? > > > > > > > > > > Possibly something like this. But we should have a plan for the > > > existing > > > > > presentations too. And we have to accept that not everyone presenting > > > > > about a Gluster (related) topic will use 'our' SaaS instance. > > > > > > > > > >> Git LFS is also also an option but we don't really need versioning > > > > >> for > > > > >> presentation files. Git LFS will keep large files in a separate > > > location > > > > >> and keep a "pointer" to those in the repo. > > > > > > > > > > I'd prefer something like this. Most of my presentations are written > > > > > while I'm travelling, so a connected service is not really an option > > > for > > > > > me in any case. > > > > > > > > The docs repo should just have links to the presentations. > > > > They could be hosted on slideshare/speakerdeck, google drive or they > > > > could be hosted html5 presentations. > > > > If required we could just host the presentations on > > > > download.gluster.org > > > . > > > > I've seen it being used to host resources for tutorials previously > > > > (like disk images), > > > > so hosting the actual presentations shouldn't be too hard. > > > > > > I really do not care where they are hosted. We just can not demand the > > > use of a SaaS for them. We can offer the option of course, but still > > > allow presenters to use the tool of their preference. > > > > > > Niels > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > > > > -- > > Amye Scavarda | amye@xxxxxxxxxx | Gluster Community Lead > > > > -- > Amye Scavarda | amye@xxxxxxxxxx | Gluster Community Lead > > > > -- > Amye Scavarda | amye@xxxxxxxxxx | Gluster Community Lead > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > -- > Amye Scavarda | amye@xxxxxxxxxx | Gluster Community Lead > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel