Re: Parallel checkout (Was Re: 0 bot for Git)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 15, 2016 at 10:08 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
> On Fri, Apr 15, 2016 at 2:51 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>> On Fri, Apr 15, 2016 at 12:04:49AM +0200, Christian Couder wrote:
>> The idea is simple, you offload some work to process workers. In this
>> patch, only entry.c:write_entry() is moved to workers. We still do
>> directory creation and all sort of checks and stat refresh in the main
>> process. Some more work may be moved away, for example, the entire
>> builtin/checkout.c:checkout_merged().
>>
>> Multi process is less efficient than multi thread model. But I doubt
>> we could make object db access thread-safe soon. The last discussion
>> was 2 years ago [1] and nothing much has happened.
>>
>> Numbers are encouraging though. On linux-2.6 repo running on linux and
>> ext4 filesystem, checkout_paths() would dominate "git checkout :/".
>> Unmodified git takes about 31s.
>
> Please also benchmark "make build" or another read heavy operation
> with these 2 different checkouts. IIRC that was the problem. (checkout
> improved, but due to file ordering on the fs, the operation afterwards
> slowed down, such that it became a net negative)

That's way too close to fs internals. Don't filesystems these days
have b-tree and indexes to speed up pathname lookup (which makes file
creation order meaningless, I guess)? If it only happens to a fs or
two, I'm leaning to say "your problem, fix your file system". A
mitigation may be let worker handle whole directory (non-recursively)
so file creation order within a directory is almost the same.

> Would it make sense to use the parallel processing infrastructure from
> run-command.h
> instead of doing all setup and teardown yourself?
> (As you call it for-fun patch, I'd assume the answer is: Writing code
> is more fun than
> using other peoples code ;)

I did look at run-command.h. Your run_process_parallel() looked almost
fit, but I needed control over stdout for coordination, not to be
printed. At that point, yes writing new code was more fun than
tweaking run_process_parallel :-D
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]