Re: repospanner and our Ansible repo

Igor Gnatenko <ignatenkobrain@xxxxxxxxxxxxxxxxx> · Tue, 17 Sep 2019 07:48:10 +0200



I don't contribute much to infra repo. Although I do pulls from time
to time. Sending PR is definitely cool, but I think waiting 10's of
seconds for pulling few commits is not very good.
On Tue, Sep 17, 2019 at 1:01 AM Randy Barlow
<bowlofeggs@xxxxxxxxxxxxxxxxx> wrote:
>
> Greetings!
>
> Kevin asked me last week whether we are ready to move our
> infrastructure Ansible repository into repospanner. The benefit of
> moving it into repospanner is that it is one way to enable us to allow
> pull requests into the repository, which I think would be nice.
>
> repospanner seems to work correctly as a git server, but it does need
> improvements in its performance, so I offered to do a little
> benchmarking with our Ansible repo and repospanner to see what kind of
> performance we might see.
>
> I deployed a 3-node repospanner cluster today on fairly high
> performance hardware (SSD storage). It was three VMs on the same
> physical machine. Note that due to my test setup, network latency was
> about as good as it could get, and so was storage iops. I believe the
> performance bottlenecks will depend heavily on storage iops. Thus, this
> hardware is not really a great way to predict how the performance might
> be if we deployed into our infra, but it was easy for me to do and get
> a "best case" performance benchmark. I am willing to attempt to
> replicate this test on more realistic hardware in our infra if we want
> more realistic data for our own use case.
>
> I pushed the Ansible repository into it. This took a very long time:
> 298m2.157s! If we were to deploy nodes in different geos and use NAS
> storage, I believe this would take longer. The good thing is that we'd
> only need to do this operation once, if we were to decide to proceed.
>
> The next test was to see how long it takes to clone our repo. I did
> this on another machine on the same LAN (so again, ideal network
> latency) and it took 2m27.433s. That's a pretty long time too I'd say,
> but maybe liveable? This would impact every contributor who wanted to
> clone us, so I'll let the list debate whether that is acceptable.
>
> Next, I made a small commit (just added/deleted some lines) and pushed
> it into the cluster. This went reasonably quick at 0.366s, which I
> think we would be OK with.
>
> The last test I performed was to see how quickly another checkout could
> pull that commit, and this was again a speed I might consider to be a
> bit slow at 4.931s, especially considering that the commit was small
> and was only one. I would expect this to be somewhat proportional to
> the amount of change that has happened since the user last fetched, and
> this repo does see a lot of activity. So I might expect git pull to
> take 10's of seconds for contributors who are fairly active and pull
> once every few days or so, and maybe longer for users who pull less
> frequently.
>
> The repo copy I tested with has 199717 objects and 132918 deltas in it.
> repospanner performance seems to be fairly proportionally correlated
> with these numbers, as the bodhi repo pushed into it in about an hour
> and has 50kish objects, iirc (didn't write it down, so from memory).
>
> I personally am on the fence about whether we should proceed at this
> time. I am certain that people will notice the speed issues, and I also
> expect that it will be slower than the numbers I listed above since my
> tests were done on consumer hardware. But it would also be pretty sweet
> if we had pull requests on the repo.
>
> Improving repospanner's performance is a goal I am focusing on, so if
> we deployed it now I would hopefully be able to get it into better
> shape soon. Alternatively, we hopefully wouldn't have to wait that long
> if we wanted to wait for performance fixes before proceeding. I could
> see either decision being reasonable.
>
> To reiterate, I'd be willing to replicate the tests I did above on
> infra hardware if we are on the fence about the numbers I've reported
> here and want to see more realistic numbers to make a final decision. I
> think that would give us more realistic numbers since the tests I did
> here were on a much more ideal situation, performance wise.
>
> What do others think?
> _______________________________________________
> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx