I don't contribute much to infra repo. Although I do pulls from time to time. Sending PR is definitely cool, but I think waiting 10's of seconds for pulling few commits is not very good. On Tue, Sep 17, 2019 at 1:01 AM Randy Barlow <bowlofeggs@xxxxxxxxxxxxxxxxx> wrote: > > Greetings! > > Kevin asked me last week whether we are ready to move our > infrastructure Ansible repository into repospanner. The benefit of > moving it into repospanner is that it is one way to enable us to allow > pull requests into the repository, which I think would be nice. > > repospanner seems to work correctly as a git server, but it does need > improvements in its performance, so I offered to do a little > benchmarking with our Ansible repo and repospanner to see what kind of > performance we might see. > > I deployed a 3-node repospanner cluster today on fairly high > performance hardware (SSD storage). It was three VMs on the same > physical machine. Note that due to my test setup, network latency was > about as good as it could get, and so was storage iops. I believe the > performance bottlenecks will depend heavily on storage iops. Thus, this > hardware is not really a great way to predict how the performance might > be if we deployed into our infra, but it was easy for me to do and get > a "best case" performance benchmark. I am willing to attempt to > replicate this test on more realistic hardware in our infra if we want > more realistic data for our own use case. > > I pushed the Ansible repository into it. This took a very long time: > 298m2.157s! If we were to deploy nodes in different geos and use NAS > storage, I believe this would take longer. The good thing is that we'd > only need to do this operation once, if we were to decide to proceed. > > The next test was to see how long it takes to clone our repo. I did > this on another machine on the same LAN (so again, ideal network > latency) and it took 2m27.433s. That's a pretty long time too I'd say, > but maybe liveable? This would impact every contributor who wanted to > clone us, so I'll let the list debate whether that is acceptable. > > Next, I made a small commit (just added/deleted some lines) and pushed > it into the cluster. This went reasonably quick at 0.366s, which I > think we would be OK with. > > The last test I performed was to see how quickly another checkout could > pull that commit, and this was again a speed I might consider to be a > bit slow at 4.931s, especially considering that the commit was small > and was only one. I would expect this to be somewhat proportional to > the amount of change that has happened since the user last fetched, and > this repo does see a lot of activity. So I might expect git pull to > take 10's of seconds for contributors who are fairly active and pull > once every few days or so, and maybe longer for users who pull less > frequently. > > The repo copy I tested with has 199717 objects and 132918 deltas in it. > repospanner performance seems to be fairly proportionally correlated > with these numbers, as the bodhi repo pushed into it in about an hour > and has 50kish objects, iirc (didn't write it down, so from memory). > > I personally am on the fence about whether we should proceed at this > time. I am certain that people will notice the speed issues, and I also > expect that it will be slower than the numbers I listed above since my > tests were done on consumer hardware. But it would also be pretty sweet > if we had pull requests on the repo. > > Improving repospanner's performance is a goal I am focusing on, so if > we deployed it now I would hopefully be able to get it into better > shape soon. Alternatively, we hopefully wouldn't have to wait that long > if we wanted to wait for performance fixes before proceeding. I could > see either decision being reasonable. > > To reiterate, I'd be willing to replicate the tests I did above on > infra hardware if we are on the fence about the numbers I've reported > here and want to see more realistic numbers to make a final decision. I > think that would give us more realistic numbers since the tests I did > here were on a much more ideal situation, performance wise. > > What do others think? > _______________________________________________ > infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx > To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx _______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx