Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Glen Choo <chooglen@xxxxxxxxxx> writes:

> It would be good if we could avoid mixing unrelated information sources
> in "struct submodule_status_util", since a) this makes it very tightly
> coupled to run_diff_files() and b) it causes us to repeat ourselves in
> the same function (.changed = changed, record_file_diff()).
>
> The only reason why the code looks this way right now is that
> match_stat_with_submodule() sets defer_submodule_status based on whether
> or not we should ignore the submodule, and this eventually tells
> get_submodule_status() what submodules it needs to care about. But,
> deciding whether to spawn a subprocess for which submodule is exactly
> what the .get_next_task member is for.
>
>> diff --git a/submodule.c b/submodule.c
>> index 426074cebb..6f6e150a3f 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
>>  	return dirty_submodule;
>>  }
>>  
>> +static struct status_task *
>> +get_status_task_from_index(struct submodule_parallel_status *sps,
>> +			   struct strbuf *err)
>> +{
>> +	for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) {
>> +		struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
>> +		struct status_task *task;
>> +
>> +		if (!verify_submodule_git_directory(util->path))
>> +			continue;
>
> So right here, we could use the "check if this submodule should be
> ignored" logic form match_stat_with_submodule() to decide whether or not
> to spawn the subprocess. IOW, I am advocating for
> get_submodules_status() to be a parallel version of
> match_stat_with_submodule() (not a parallel version of
> is_submodule_modified() that shuttles extra information).

It turns out to be quite difficult to implement a parallel
match_stat_with_submodule():

  a) we can't remove it because it still has another caller
  b) its internals are quite hard to refactor: one conditional arm depends
    on "changed", which is set by calling ie_match_stat(), which in turn
    requires the "struct stat" to have already been lstat()-ed...

So even though this series adds a lot, it is just about as minimally
invasive as possible.

I suspect that there are some possible cleanups down the line, e.g.
is_submodule_modified() is rightfully only called by diff-lib.c , so I
think it should be a static function there. And once we move that, we
can make our parallel function static, and then we don't have to worry
about tight coupling to run_diff_files(). To keep the range-diff
manageable, that can be left for a future cleanup though.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux