RE: Parallelism for submodule update

<rsbecker@xxxxxxxxxxxxx> · Mon, 2 Jan 2023 11:54:24 -0500

>-----Original Message-----
>From: <Christian.Zitzmann@xxxxxxxxxxx>
On January 2, 2023 11:45 AM Christian Zitzmann wrote:
>we are using git since many years with also heavily using submodules.
>
>When updating the submodules, only the fetching part is done in parallel (with
>config submodule.fetchjobs or --jobs) but the checkout is done sequentially
>
>What I’ve recognized when cloning with
>- scalar clone --full-clone --recurse-submodules <URL> or
>- git clone --filter=blob:none --also-filter-submodules --recurse-submodules
><URL>
>
>We loose performance, as the fetch of the blobs is done in the sequential
>checkout part, instead of in the parallel part.
>
>Furthermore, the utilization - without partial clone - of network and harddisk is not
>always good, as first the network is utilized (fetch) and then the harddisk
>(checkout)
>
>As the checkout part is local to the submodule (no shared resources to block), it
>would be great if we could move the checkout into the parallelized part.
>E.g. by doing fetch and checkout (with blob fetching) in one step with e.g.
>run_processes_parallel_tr2
>
>I expect that this significantly improves the performance, especially when using
>partial clones.
>
>Do you think this is possible? Do I miss anything in my thoughts?

Since this is a platform-specific request, if it happens, this should be a configuration switch that defaults off. On my platform, the file system itself is fairly fast, but the name service traversals and resolutions (what happens in the name service) is a performance problem. Doing the checkout/switch in parallel would actually be counter-productive in my case. So I would keep it off, but I get that other platforms could benefit.

Regards,
Randall

--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.