On 07/11, Jonathan Nieder wrote: > Hi, > > Brandon Williams wrote: > > > Convert grep to use 'struct repository' which enables recursing into > > submodules to be handled in-process. > > \o/ > > This will be even nicer with the changes described at > https://public-inbox.org/git/20170706202739.6056-1-sbeller@xxxxxxxxxx/. > Until then, I fear it will cause a regression --- see (*) below. > > [...] > > Documentation/git-grep.txt | 7 - > > builtin/grep.c | 390 +++++++++------------------------------------ > > cache.h | 1 - > > git.c | 2 +- > > grep.c | 13 -- > > grep.h | 1 - > > setup.c | 12 +- > > 7 files changed, 81 insertions(+), 345 deletions(-) > > Yay, tests still pass. > > [..] > > --- a/Documentation/git-grep.txt > > +++ b/Documentation/git-grep.txt > > @@ -95,13 +95,6 @@ OPTIONS > > <tree> option the prefix of all submodule output will be the name of > > the parent project's <tree> object. > > > > ---parent-basename <basename>:: > > - For internal use only. In order to produce uniform output with the > > - --recurse-submodules option, this option can be used to provide the > > - basename of a parent's <tree> object to a submodule so the submodule > > - can prefix its output with the parent's name rather than the SHA1 of > > - the submodule. > > Being able to get rid of this is a very nice change. > > [...] > > +++ b/builtin/grep.c > [...] > > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char *filename) > > { > > struct strbuf buf = STRBUF_INIT; > > > > - if (super_prefix) > > - strbuf_addstr(&buf, super_prefix); > > - strbuf_addstr(&buf, filename); > > - > > if (opt->relative && opt->prefix_length) { > > - char *name = strbuf_detach(&buf, NULL); > > - quote_path_relative(name, opt->prefix, &buf); > > - free(name); > > + quote_path_relative(filename, opt->prefix, &buf); > > + } else { > > + strbuf_addstr(&buf, filename); > > } > > style micronit: can avoid these braces since both branches are > single-line. Didn't realize that with all the deleted lines, I'll fix for the next version. > > [...] > > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char *prefix) > > exit(status); > > } > > > > -static void compile_submodule_options(const struct grep_opt *opt, > > - const char **argv, > > - int cached, int untracked, > > - int opt_exclude, int use_index, > > - int pattern_type_arg) > > -{ > [...] > > - /* > > - * Limit number of threads for child process to use. > > - * This is to prevent potential fork-bomb behavior of git-grep as each > > - * submodule process has its own thread pool. > > - */ > > - argv_array_pushf(&submodule_options, "--threads=%d", > > - (num_threads + 1) / 2); > > Being able to get rid of this is another very nice change. > > [...] > > + /* add objects to alternates */ > > + add_to_alternates_memory(submodule.objectdir); > > (*) This sets up a single in-memory object store with all the > processed submodules. Processed objects are never freed. > This means that if I run a command like > > git grep --recurse-submodules -e neverfound HEAD > > in a project with many submodules then memory consumption scales in > the same way as if the project were all one repository. By contrast, > without this patch, git is able to take advantage of the implicit > free() when each child exits to limit its memory usage. > > Worse, this increases the number of pack files git has to pay > attention to the sum of the numbers of pack files in all the > repositories processed so far. A single object lookup can take > O(number of packs * log(number of objects in each pack)) time. That > means performance is likely to suffer as the number of submodules > increases (n^2 performance) even on systems with a lot of memory. > > Once the object store is part of the repository struct and freeable, > those problems go away and this patch becomes a no-brainer. > > What should happen until then? Should this go in "next" so we can get > experience with it but with care not to let it graduate to "master"? I agree that this is an issue and that we need to address by having an object store per repository. While that is being worked on (by Stefan) I don't know how long it would take to have it be a reality. So the question ends up being do we care more about the state of the code and cleaning up a lot of 'hacks' that I introduced to get grep working with submodules, or do we care about the performance more. I don't know which is the right answer but I'd personally like to see the hacks I added to be removed sooner rather than later. That and I think that with the code in this sate it would make it easier to transition once we have per-repository object-stores. Either way I should add a NEEDSWORK comment here to indicate that it should be removed once per-repo object-stores exist. > > Aside from those two concerns, this patch looks very good from a quick > skim, though I haven't reviewed it closely line-by-line. Once we know > how to go forward, I'm happy to look at it again. > > Thanks, > Jonathan -- Brandon Williams