On Wed, Sep 18, 2019 at 9:58 AM Stephen John Smoogen <smooge@xxxxxxxxx> wrote: > > On Wed, 18 Sep 2019 at 09:44, Randy Barlow <bowlofeggs@xxxxxxxxxxxxxxxxx> wrote: > > > > On Tue, 2019-09-17 at 19:01 -0400, Neal Gompa wrote: > > > Out of curiosity, do we know where the bottlenecks are in > > > repoSpanner? > > > In theory, the architecture of repoSpanner isn't supposed to be too > > > different from gitaly, so I'm curious where we're falling down. > > > > I believe it needs a more efficient way to store the git objects. As I > > understand it, it currently stores each one in its own file, resulting > > in a large number of small files. > > So my "hot-take probably wrong" look at things seems to indicate that > the reason it stores everything as a separate file is to make certain > git actions faster. When you pack the files, searches, diffs and other > checks become slower or memory intensive because you have to calculate > new deltas and other things 'lost' in the packing. > > Looking at the gitaly documents, I think that is the reason they have > multiple different types of in-memory caches at different layers. It > allows for both faster accesses but probably blows up the size of what > is needed for hardware. We have to be careful here because we don't > have a hardware reserve to dive into for more memory/cpu. > > I think that for gitlab.org (versus running a local gitlab) they also > use a lot of backend 'eventual' consistency caching. You push and it > begins to spread that out through the multiple regions it is housed. > The 'user' doesn't see this because the front end level just directs > you to the known hot caches for that particular pull/push request.. > but if you somehow were hardcoded to a region you might not see the > update/change for a while because it hasn't mirrored out completely. > That also would speed up push/pull/changes greatly and not something > we could 'duplicate'. > That definitely explains the performance consistency between repoSpanner and gitaly for my local deployment. So it's most likely related to how they simulate better performance as the backend catches up. That said, the most recent change to gitaly is that it now does hashed storage of git objects and does "fast forking" using alternates instead of storing as bare git repos and duplicating repos on disk. None of that changes the initial push for a unique repo. -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx