Greetings! Kevin asked me last week whether we are ready to move our infrastructure Ansible repository into repospanner. The benefit of moving it into repospanner is that it is one way to enable us to allow pull requests into the repository, which I think would be nice. repospanner seems to work correctly as a git server, but it does need improvements in its performance, so I offered to do a little benchmarking with our Ansible repo and repospanner to see what kind of performance we might see. I deployed a 3-node repospanner cluster today on fairly high performance hardware (SSD storage). It was three VMs on the same physical machine. Note that due to my test setup, network latency was about as good as it could get, and so was storage iops. I believe the performance bottlenecks will depend heavily on storage iops. Thus, this hardware is not really a great way to predict how the performance might be if we deployed into our infra, but it was easy for me to do and get a "best case" performance benchmark. I am willing to attempt to replicate this test on more realistic hardware in our infra if we want more realistic data for our own use case. I pushed the Ansible repository into it. This took a very long time: 298m2.157s! If we were to deploy nodes in different geos and use NAS storage, I believe this would take longer. The good thing is that we'd only need to do this operation once, if we were to decide to proceed. The next test was to see how long it takes to clone our repo. I did this on another machine on the same LAN (so again, ideal network latency) and it took 2m27.433s. That's a pretty long time too I'd say, but maybe liveable? This would impact every contributor who wanted to clone us, so I'll let the list debate whether that is acceptable. Next, I made a small commit (just added/deleted some lines) and pushed it into the cluster. This went reasonably quick at 0.366s, which I think we would be OK with. The last test I performed was to see how quickly another checkout could pull that commit, and this was again a speed I might consider to be a bit slow at 4.931s, especially considering that the commit was small and was only one. I would expect this to be somewhat proportional to the amount of change that has happened since the user last fetched, and this repo does see a lot of activity. So I might expect git pull to take 10's of seconds for contributors who are fairly active and pull once every few days or so, and maybe longer for users who pull less frequently. The repo copy I tested with has 199717 objects and 132918 deltas in it. repospanner performance seems to be fairly proportionally correlated with these numbers, as the bodhi repo pushed into it in about an hour and has 50kish objects, iirc (didn't write it down, so from memory). I personally am on the fence about whether we should proceed at this time. I am certain that people will notice the speed issues, and I also expect that it will be slower than the numbers I listed above since my tests were done on consumer hardware. But it would also be pretty sweet if we had pull requests on the repo. Improving repospanner's performance is a goal I am focusing on, so if we deployed it now I would hopefully be able to get it into better shape soon. Alternatively, we hopefully wouldn't have to wait that long if we wanted to wait for performance fixes before proceeding. I could see either decision being reasonable. To reiterate, I'd be willing to replicate the tests I did above on infra hardware if we are on the fence about the numbers I've reported here and want to see more realistic numbers to make a final decision. I think that would give us more realistic numbers since the tests I did here were on a much more ideal situation, performance wise. What do others think?
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx