On Fri, Sep 09, 2016 at 02:53:24PM -0700, Brandon Williams wrote: > Allow ls-files to recognize submodules in order to retrieve a list of > files from a repository's submodules. This is done by forking off a > process to recursively call ls-files on all submodules. > > Signed-off-by: Brandon Williams <bmwill@xxxxxxxxxx> > --- > Hey git developers! > > I'm new to the community and this is the first patch for an open source project > that I have worked on. > > I'm looking forward to working on the project! Welcome. :) Submodules are not really my area of expertise, so I don't have any commentary on the goal of the patch, except that it sounds reasonable to my layman's ears. The implementation looks fairly clean. A few comments: > +static void show_gitlink(const struct cache_entry *ce) > +{ > + struct child_process cp = CHILD_PROCESS_INIT; > + struct strbuf buf = STRBUF_INIT; > + struct strbuf name = STRBUF_INIT; > + int submodule_name_len; > + FILE *fp; > + > + argv_array_push(&cp.args, "ls-files"); > + argv_array_push(&cp.args, "--recurse-submodules"); > + cp.git_cmd = 1; > + cp.dir = ce->name; > + cp.out = -1; > + start_command(&cp); > + fp = fdopen(cp.out, "r"); You should error-check the result of start_command(). I guess the reasonable outcome would be to die(), as it is a sign that we could not fork, find git, etc. Ditto for fdopen (you can use xfdopen for that). > + /* > + * The ls-files child process produces filenames relative to > + * the submodule. Prefix each line with the submodule path > + * to make it relative to the current repository. > + */ > + strbuf_addstr(&name, ce->name); > + strbuf_addch(&name, '/'); > + submodule_name_len = name.len; > + while (strbuf_getline(&buf, fp) != EOF) { > + strbuf_addbuf(&name, &buf); > + write_name(name.buf); > + strbuf_setlen(&name, submodule_name_len); > + } What happens if the filename in the submodule needs quoting? You'll get the quoted value in your buffer, and then re-quote it again in write_name(). The simplest thing would probably be to use "ls-files -z" for the recursive invocation, and then split on NUL bytes (we have strbuf_getline_nul for that). > + finish_command(&cp); What should happen if finish_command() tells us that the ls-files sub-process reported an error? It may not be worth aborting the rest of the listing, but we might want to propagate that in our own return code. > + strbuf_release(&buf); > + strbuf_release(&name); > + fclose(fp); > +} A minor style nit, but I would generally fclose(fp) before running finish_command() (i.e., resource clean up in the reverse order of allocation). It doesn't matter in this case because "fp" is output from the process, and we know we've already read to EOF. For other cases, it could cause a deadlock (e.g., we end up in wait() for the child process to finish, but it is blocked in write() waiting for us to read). So I think it's a good habit to get into. > @@ -519,6 +566,17 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > if (require_work_tree && !is_inside_work_tree()) > setup_work_tree(); > > + if (recurse_submodules && > + (show_stage || show_deleted || show_others || show_unmerged || > + show_killed || show_modified || show_resolve_undo || > + show_valid_bit || show_tag || show_eol)) > + die("ls-files --recurse-submodules can only be used in " > + "--cached mode"); Woah, that list of variables is getting rather long. This is not a problem introduced by your patch, so it's not a blocker. But I wonder if some of them are mutually exclusive and could be collapsed to a single variable. I guess the reason for this "only with --cached" is that you do not propagate the options down to the recursive process. If we were to do that, then this big list of restrictions would go away. I'd be OK with starting with more limited functionality like your patch, though. I think doing the recursive thing correctly would also involve parsing the output of each to append the filename prefix. So I suppose another option would be to teach ls-files a "prefix" option to add to each filename, and just pass in the submodule path. Then you can let the sub-processes write directly to the common stdout, and I think it would be safe to blindly pass the parent argv into the child processes. -Peff