Re: [PATCH v6 3/3] diff: add SUBMODULE_DIFF format to display submodule diff

Jacob Keller <jacob.keller@xxxxxxxxx> · Tue, 16 Aug 2016 13:25:46 -0700

On Tue, Aug 16, 2016 at 11:48 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jacob Keller <jacob.e.keller@xxxxxxxxx> writes:
>
>> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
>> index d5a5b17d5088..f5d693afad6c 100644
>> --- a/Documentation/diff-config.txt
>> +++ b/Documentation/diff-config.txt
>> @@ -123,7 +123,8 @@ diff.suppressBlankEmpty::
>>  diff.submodule::
>>       Specify the format in which differences in submodules are
>>       shown.  The "log" format lists the commits in the range like
>> -     linkgit:git-submodule[1] `summary` does.  The "short" format
>> +     linkgit:git-submodule[1] `summary` does.  The "diff" format shows the
>> +     diff between the beginning and end of the range. The "short" format
>>       format just shows the names of the commits at the beginning
>>       and end of the range.  Defaults to short.
>
> It would be much better to describe the default one first and then
> more involved one next, no?  That would also match what the change
> to "diff-options" in this patch does (can be seen below).
>

The main thing is that "--submodule" alone means "use the log format"
so I think that's why it went first. I can reword this to make it more
clear how this works.

Thanks,
Jake

>> @@ -2311,6 +2322,15 @@ static void builtin_diff(const char *name_a,
>>                               two->dirty_submodule,
>>                               meta, del, add, reset);
>>               return;
>> +     } else if (o->submodule_format == DIFF_SUBMODULE_DIFF &&
>> +                (!one->mode || S_ISGITLINK(one->mode)) &&
>> +                (!two->mode || S_ISGITLINK(two->mode))) {
>> +             show_submodule_diff(o->file, one->path ? one->path : two->path,
>> +                             line_prefix,
>> +                             one->oid.hash, two->oid.hash,
>> +                             two->dirty_submodule,
>> +                             meta, a_prefix, b_prefix, reset);
>> +             return;
>>       }
>
> The "either missing or is submodule" check used here is being
> consistent with the existing "submodule=log" case.  Good.
>
>> diff --git a/submodule.c b/submodule.c
>> index 1b5cdfb7e784..b1da68dd49c9 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -333,6 +333,136 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
>>       strbuf_release(&sb);
>>  }
>>
>> +void show_submodule_diff(FILE *f, const char *path,
>> +             const char *line_prefix,
>> +             unsigned char one[20], unsigned char two[20],
>> +             unsigned dirty_submodule, const char *meta,
>> +             const char *a_prefix, const char *b_prefix,
>> +             const char *reset)
>> +{
>> +     struct strbuf submodule_git_dir = STRBUF_INIT, sb = STRBUF_INIT;
>> +     struct child_process cp = CHILD_PROCESS_INIT;
>> +     const char *git_dir;
>> +
>> +     if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) {
>> +             fprintf(f, "%sSubmodule %s contains untracked content\n",
>> +                     line_prefix, path);
>> +     }
>> +     if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) {
>> +             fprintf(f, "%sSubmodule %s contains modified content\n",
>> +                     line_prefix, path);
>> +     }
>> +
>> +     strbuf_addf(&sb, "%s%sSubmodule %s %s..",
>> +                 line_prefix, meta, path,
>> +                 find_unique_abbrev(one, DEFAULT_ABBREV));
>> +     strbuf_addf(&sb, "%s:%s",
>> +                 find_unique_abbrev(two, DEFAULT_ABBREV),
>> +                 reset);
>> +     fwrite(sb.buf, sb.len, 1, f);
>> +
>> +     if (is_null_sha1(one))
>> +             fprintf(f, " (new submodule)");
>> +     if (is_null_sha1(two))
>> +             fprintf(f, " (submodule deleted)");
>
> These messages are in sync with show_submodule_summary() that is
> used in --submodule=log codepath.  Good.
>

They're not exactly the same due to some ways of splitting up new lines.

>> +     /*
>> +      * We need to determine the most accurate location to call the sub
>> +      * command, and handle the various corner cases involved. We'll first
>> +      * attempt to use the path directly if the submodule is checked out.
>> +      * Then, if that fails, we'll check the standard module location in
>> +      * the git directory. If even this fails, it means we can't do the
>> +      * lookup because the module has not been initialized.
>> +      */
>
> This is more elaborate than what show_submodule_summary() does,
> isn't it?  Would it make the patch series (and the resulting code)
> more understandable if you used the same code by refactoring these
> two?  If so, I wonder if it makes sense to split 3/3 into a few
> separate steps:

The show_submodule_summary just uses "add_submodule_odb" which adds
the submodule as an alternate source of objects, if I understand
correctly.

>
>  * Update the internal "--submodule=<type>" handling without adding
>    the "--submodule=diff" and show_submodule_diff() function.
>
>  * Refactor the determination of the submodule status (i.e. does it
>    even have a clone?  where is its repository? etc.) from existing
>    show_submodule_summary() into a separate helper function.
>
>  * Make that helper function more elaborate like what you do here,
>    and update show_submodule_summary().  I think the state
>    show_submodule_summary() calls "not checked out" corresponds to
>    what you say "not initialized" below, and they should share the
>    same logic to determine that the submodule is in that state, and
>    share the same message, for example.

Makes sense, I might squash that with the above if it's easier.

>
>  * Introduce "--submodule=diff", and show_submodule_diff() function;
>    the latter would use the helper function prepared in the previous
>    step.
>
> perhaps?
>

That makes more sense. I'll rework the series some more.

>> +
>> +     if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) {
>> +             /*
>> +              * If the submodule has modified contents we want to diff
>> +              * against the work tree, so don't add a second parameter.
>> +              * This is essentially equivalent of using diff-index instead.
>> +              * Note that we can't (easily) show the diff of any untracked
>> +              * files.
>> +              */
>> +     } else if (is_null_sha1(two)) {
>
> It is safer to have ';' inside the empty if(){} block to make sure
> that one empty statement exists there.  It makes the intention of
> the code clearer, too.
>

Will do.

> I am debating myself if this is a good thing to do, though.  The
> submodule is a separate project for a reason, and there is a reason
> why the changes haven't been committed.  When asking "what's different
> between these two superproject states?", should the user really see
> these uncommitted changes?
>
> Thanks.

Well, the previous submodule code for "log" prints out "submodule has
untracked content" and "submodule has modified content" so I figured
the diff might want to show that as a diff too? Or maybe we just stick
with the messages and only show the difference of what's actually been
committed.. I think that's probably ok too.

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html