Hi Jeff, On Thu, Nov 19, 2020 at 09:04:34AM -0500, Jeff Hostetler wrote: > On 11/18/20 11:01 PM, Matheus Tavares wrote: > > > >On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > >> > >>I can't really speak to NFS performance, but I have to wonder if there's > >>not something else affecting the results -- 4 and/or 8 core results are > >>better than 16+ results in some columns. And we get diminishing returns > >>after ~16. > > > >Yeah, that's a good point. I'm not sure yet what's causing the > >diminishing returns, but Geert and I are investigating. Maybe we are > >hitting some limit for parallelism in this scenario. > > I seem to recall back when I was working on this problem that > the unzip of each blob was a major pain point. Combine this > long delta-chains and each worker would need multiple rounds of > read/memmap, unzip, and de-delta before it had the complete blob > and could then smudge and write. I think that there are two cases here: 1) (CPU bound case) On local machines with multiple cores and SSD disks, checkout is CPU bound and the parallel checkout works because the unzipping can now run on multiple CPUs in parallel. Shorter chains would use less CPU time and we'd see a smilar benefit on both paralell and sequential checkout. 2) (IO bound case) On networked file systems, file system IO is pretty much always the bottleneck for git and similar applications that use small files. On NFS calling open() is always a round trip, and so is close() (in the absence of delegations and O_CREAT). The latency of these calls depends on the NFS server and network distance, but 1ms is a reasonable order of magnitude when thinking about this. Beause this 1ms is a lot more than the typical CPU time to process a single blob, checkout will be IO bound. Parallel checkout works by allowing the application to maintain an IO depth > 1 for these workloads, which amortizes the network latency over multiple requests. Regarding the diminishing returns: I did some initial analysis of Mattheus' code and I'm not sure yet. I see the code achieving a high IO depth in our server logs, which would indicate that the diminishing returns are caused by file system contention. This would have to be some kind of general contention since it happens both on NFS and EFS. I will do a deeper investigation on this and will report what I find. Best regards, Geert