Re: [RFC PATCH 0/5] recursively grep across submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 27, 2016 at 4:26 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Brandon Williams <bmwill@xxxxxxxxxx> writes:
>
>> As for the rest of the series, it should be ready for review or comments.
>
> Just a few brief comments, before reading the patches carefully.
>
>  * It is somewhat surprising that [1/5] is even needed (in other
>    words, I would have expected something like this to be already
>    there, and my knee-jerk reaction was "Heh, how does 'git status'
>    know how to show submodules that are and are not initialized
>    differently without this?"

The issue with much of the existing code is that it is submodule centric,
i.e. it is written to not care about the rest.

git status for example just calls "git submodule summary" to
parse and display the submodule information additionally.
It doesn't integrate submodules and treats them "just like files".

git submodule summary then proceeds to use "submodule--helper list"
that lists submodules *only* ignoring all files.

>
>    The implementation that reads from the config of the current
>    repository may be OK, but I actually would have expected that a
>    check would be "given a $path, check to see if $path/.git is
>    there and is a valid repository".  In a repository where the
>    submodules originate, there may not even be submodule.$name.url
>    entries there yet.

My reaction to 1/5 was that the implementation is sound,
but the design may need rethinking.

Instead of asking all these question, "Is a submodule
* initialized
* checked out (== have a working dir)
* have a .git dir (think of deleted submodules that keep the
  historical git dir around)
(* have commit X)
we would want to either extend the submodule-config API
to also carry these informations just like
name/path/sha1/url/shallow clone recommendation.

Obtaining the information above is however not as cheap,
because we'd need to do extra work additionally to parsing
the .gitmodules file. So the submodule-config would need to learn
an input that will tell the submodule-config what informations should
be evaluated and which can be omitted.

>
>  * It is somewhat surprising that [4/5] does not even use the
>    previous ls-files to find out the paths.  Also it is a bit
>    disappointing to see that the way processes are spawned and
>    managed does not share much with Stefan's earlier work, i.e.
>    run_processes_parallel().  I was somehow hoping that it can be
>    extended to support this use case, but apparently there aren't
>    much to be shared.

I think there are 2 issues here:
* The API I designed runs processes in parallel and the order or
  output is non-deterministic. git-grep uses threads and output is
  alphabetically sorted. The order is fixable though (by e.g. adding
  a flag that indicates which parallel processing output the caller
  wants).

* git-grep already has its own thread pool; integrating/combining
  2 worker pools doesn't sound trivial even to someone who wrote
  one of them.
  Maybe we could extend/rewrite the run_processes_parallel
  API to not just run processes, but instead you could also provide
  a function pointer that is used in a thread instead.
  Then we'd have one machinery that e.g. keeps track of the
  number of parallel processes/threads.

Thanks,
Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]