>-----Original Message----- >From: <Christian.Zitzmann@xxxxxxxxxxx> On January 2, 2023 11:45 AM Christian Zitzmann wrote: >we are using git since many years with also heavily using submodules. > >When updating the submodules, only the fetching part is done in parallel (with >config submodule.fetchjobs or --jobs) but the checkout is done sequentially > >What I’ve recognized when cloning with >- scalar clone --full-clone --recurse-submodules <URL> or >- git clone --filter=blob:none --also-filter-submodules --recurse-submodules ><URL> > >We loose performance, as the fetch of the blobs is done in the sequential >checkout part, instead of in the parallel part. > >Furthermore, the utilization - without partial clone - of network and harddisk is not >always good, as first the network is utilized (fetch) and then the harddisk >(checkout) > >As the checkout part is local to the submodule (no shared resources to block), it >would be great if we could move the checkout into the parallelized part. >E.g. by doing fetch and checkout (with blob fetching) in one step with e.g. >run_processes_parallel_tr2 > >I expect that this significantly improves the performance, especially when using >partial clones. > >Do you think this is possible? Do I miss anything in my thoughts? Since this is a platform-specific request, if it happens, this should be a configuration switch that defaults off. On my platform, the file system itself is fairly fast, but the name service traversals and resolutions (what happens in the name service) is a performance problem. Doing the checkout/switch in parallel would actually be counter-productive in my case. So I would keep it off, but I get that other platforms could benefit. Regards, Randall -- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400) NonStop(211288444200000000) -- In real life, I talk too much.