Re: RFC/Discussion - Submodule UX Improvements

Jacob Keller <jacob.keller@xxxxxxxxx> · Tue, 20 Apr 2021 23:57:17 -0700

On Tue, Apr 20, 2021 at 11:47 AM Emily Shaffer <emilyshaffer@xxxxxxxxxx> wrote:
>
> On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote:
> >
> > On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker
> > <rsbecker@xxxxxxxxxxxxx> wrote:
> > > On April 19, 2021 3:15 PM, Jacob Keller wrote:
> > > > A sort of dream I had was a flow where I could do something from the parent
> > > > like "git blame <path/to/submodule>/submodule/file" and have it present a
> > > > blame of that files contents keyed on the *parent* commit that changed the
> > > > submodule to have that line, as opposed to being forced to go into the
> > > > submodule and figure out what commit introduced it and then go back to the
> > > > parent and find out what commit changed the submodule to include that
> > > > submodule commit.
> > >
> > > Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file.
> > >
> >
> > Well, I would like if "git blame <path/to/submodule>" did.. something
> > other than just fail. Sometimes my brain is working in a "blame where
> > this came from" and I type that out and then get frustrated when it
> > fails. Additionally...
> >
> > > Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code.
> > >
> > > Just my musings.
> >
> > I'm not asking for "git blame <path/to/submodule>/<file>" to give the
> > the same outout as "cd <path/to/submodule> && git blame <file>"
> >
> > What i'm asking is: given this file, tell me which commit in the
> > parent did the line get introduced. So basically I want to walk over
> > the changes to the submodule pointer and find out when it get
> > introduced into the parent, not when it got introduced into the
> > submodule itself.
> >
> > This is a related question, but it is actually not trivial to go
> > instantly from "it was in xyz submodule commit" to "it was then pulled
> > in by xyz parent commit". It's something that is quite tedious to do
> > manually, especially since the submodule pointer could change
> > arbitrarily so knowing the submodule commit doesn't mean you can
> > simply grep for which commit set the submodule exactly to that commit.
> > Essentially, I want a 'git blame' that ignores all changes which
> > aren't actually the submodule pointer, update.
> >
> > I think that's something that is much harder to do manually, but feels
> > like it should be relatively simple to implement within the blame
> > algorithm. I don't feel like this is something strictly replaceable by
> > "git submodule foreach"
>
> I think I understand what you're saying. Something like the following
> tree:
>
> super   sub
> b------->4
>          3
>          2
> a------->1
>
> producing something like this:
>
> 'git -C sub blame main.c'
>
> 1 AU Thor       2020-01-01
> 2 CO Mitter     2020-01-02              int main() {
> 4 AU Thor       2020-01-04                printf("Hello world!\n");
> 3 Dev E         2020-01-03                return 0;
> 2 CO Mitter     2020-01-02              }
>
> and
> 'git blame sub/main.c'
>
> a Mai N         2020-01-01
> b Senior Dev    2020-01-04              int main() {
> b Senior Dev    2020-01-04                printf("Hello world!\n");
> b Senior Dev    2020-01-04                return 0;
> b Senior Dev    2020-01-04              }
>
> or to put it another way: if we are treating superproject commit as "the
> whole feature", then it could be useful to see "which feature added this
> change" instead of "which atomic commit inside a feature added this
> change".
>

Right. I often want to find out when some change actually made it into
the super project.

> To me, it sounds expensive to compute... wouldn't you  need to say, for
> each blame line, "is this commit an ancestor of the commit associated in
> THIS superproject commit? ...how about the next superproject commit?"
> But I also don't have much experience with the blame implementation so
> maybe I'm thinking naively :) :)

Well I imagine it has to be similar to how we compute the blame for a
regular file? I imagine we start at some commit and walk backwards up
the tree, no?

I imagine the current blame algorithm starts from the current commit
and walks backwards through the commit history, determining which
commit was last to have a given line.

In the submodule case I highlighted, we would be doing the same thing:
Follow the super project history. When you find a submodule file, pull
its contents from the matching submodule commit that the parent
history saw. No need to dig any further into the submodule commit
history, just give me that contents and then I can treat it as if that
contents was what was in the super project for this commit, and use
the normal blame algorithm.

It's much more difficult to do that manually (hence why we invented
blame/annotate in the first place), and trying to go from "git -C
<submodule> blame file" to then figure out which super project commit
introduced the change is also tedious and non-trivial considering you
might now have intermediate or unrelated changes (i.e. it's actually
possible that that particular commit *never* made it into the super
project at all, because it got skipped over, and it might even be
after the file got re-written)

My idea for how blame of submodujles work is to essentially pretend as
if you had subtree merged the contents of the submodule into regular
parent project files with those paths, and then do blame on that using
just the parent project history.... If that makes sense?

>
> And even if it is expensive, considering that Jacob and Randall both had
> different ideas of what their ideal 'git blame' recursive behavior would
> be, maybe it makes sense to use a flag to ask for the more expensive
> behavior, e.g. 'git blame --show-superproject-commit sub/main.c'?
>

Right I imagine that in some ways both are useful, and it depends on
the context of what you're looking for.

The reason I bring up the blame example is because the idea for what I
want is quite tedious to mimic by hand, and requires more than just a
simple git submodule foreach or a cd into the submodule to operate on
it as a standalone repository.

>  - Emily