On Tue, Apr 20, 2021 at 11:47 AM Emily Shaffer <emilyshaffer@xxxxxxxxxx> wrote: > > On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote: > > > > On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker > > <rsbecker@xxxxxxxxxxxxx> wrote: > > > On April 19, 2021 3:15 PM, Jacob Keller wrote: > > > > A sort of dream I had was a flow where I could do something from the parent > > > > like "git blame <path/to/submodule>/submodule/file" and have it present a > > > > blame of that files contents keyed on the *parent* commit that changed the > > > > submodule to have that line, as opposed to being forced to go into the > > > > submodule and figure out what commit introduced it and then go back to the > > > > parent and find out what commit changed the submodule to include that > > > > submodule commit. > > > > > > Not going to disagree, but are you looking for the blame on the submodule ref file itself or files in the submodule? It's hard to teach git to do a blame on a one-line file. > > > > > > > Well, I would like if "git blame <path/to/submodule>" did.. something > > other than just fail. Sometimes my brain is working in a "blame where > > this came from" and I type that out and then get frustrated when it > > fails. Additionally... > > > > > Otherwise, and I think this is what you really are going for, teaching it to do a blame based on "git blame <path/to/submodule>/submodule/file" would be very nice and abstracts out the need for the user (or more importantly to me = scripts) to understand that a submodule is involved; however, it is opening up a very large door: "should/could we teach git to abstract submodules out of every command". This would potentially replace a significant part of the use cases for the "git submodule foreach" sub-command. In your ask, the current paradigm "cd <path/to/submodule>/submodule && git blame file" or pretty much every other command does work, but it requires the user/script to know you have a submodule in the path. So my question is: is this worth the effort? I don't have a good answer to that question. Half of my brain would like this very much/the other half is scared of the impact to the code. > > > > > > Just my musings. > > > > I'm not asking for "git blame <path/to/submodule>/<file>" to give the > > the same outout as "cd <path/to/submodule> && git blame <file>" > > > > What i'm asking is: given this file, tell me which commit in the > > parent did the line get introduced. So basically I want to walk over > > the changes to the submodule pointer and find out when it get > > introduced into the parent, not when it got introduced into the > > submodule itself. > > > > This is a related question, but it is actually not trivial to go > > instantly from "it was in xyz submodule commit" to "it was then pulled > > in by xyz parent commit". It's something that is quite tedious to do > > manually, especially since the submodule pointer could change > > arbitrarily so knowing the submodule commit doesn't mean you can > > simply grep for which commit set the submodule exactly to that commit. > > Essentially, I want a 'git blame' that ignores all changes which > > aren't actually the submodule pointer, update. > > > > I think that's something that is much harder to do manually, but feels > > like it should be relatively simple to implement within the blame > > algorithm. I don't feel like this is something strictly replaceable by > > "git submodule foreach" > > I think I understand what you're saying. Something like the following > tree: > > super sub > b------->4 > 3 > 2 > a------->1 > > producing something like this: > > 'git -C sub blame main.c' > > 1 AU Thor 2020-01-01 > 2 CO Mitter 2020-01-02 int main() { > 4 AU Thor 2020-01-04 printf("Hello world!\n"); > 3 Dev E 2020-01-03 return 0; > 2 CO Mitter 2020-01-02 } > > and > 'git blame sub/main.c' > > a Mai N 2020-01-01 > b Senior Dev 2020-01-04 int main() { > b Senior Dev 2020-01-04 printf("Hello world!\n"); > b Senior Dev 2020-01-04 return 0; > b Senior Dev 2020-01-04 } > > or to put it another way: if we are treating superproject commit as "the > whole feature", then it could be useful to see "which feature added this > change" instead of "which atomic commit inside a feature added this > change". > Right. I often want to find out when some change actually made it into the super project. > To me, it sounds expensive to compute... wouldn't you need to say, for > each blame line, "is this commit an ancestor of the commit associated in > THIS superproject commit? ...how about the next superproject commit?" > But I also don't have much experience with the blame implementation so > maybe I'm thinking naively :) :) Well I imagine it has to be similar to how we compute the blame for a regular file? I imagine we start at some commit and walk backwards up the tree, no? I imagine the current blame algorithm starts from the current commit and walks backwards through the commit history, determining which commit was last to have a given line. In the submodule case I highlighted, we would be doing the same thing: Follow the super project history. When you find a submodule file, pull its contents from the matching submodule commit that the parent history saw. No need to dig any further into the submodule commit history, just give me that contents and then I can treat it as if that contents was what was in the super project for this commit, and use the normal blame algorithm. It's much more difficult to do that manually (hence why we invented blame/annotate in the first place), and trying to go from "git -C <submodule> blame file" to then figure out which super project commit introduced the change is also tedious and non-trivial considering you might now have intermediate or unrelated changes (i.e. it's actually possible that that particular commit *never* made it into the super project at all, because it got skipped over, and it might even be after the file got re-written) My idea for how blame of submodujles work is to essentially pretend as if you had subtree merged the contents of the submodule into regular parent project files with those paths, and then do blame on that using just the parent project history.... If that makes sense? > > And even if it is expensive, considering that Jacob and Randall both had > different ideas of what their ideal 'git blame' recursive behavior would > be, maybe it makes sense to use a flag to ask for the more expensive > behavior, e.g. 'git blame --show-superproject-commit sub/main.c'? > Right I imagine that in some ways both are useful, and it depends on the context of what you're looking for. The reason I bring up the blame example is because the idea for what I want is quite tedious to mimic by hand, and requires more than just a simple git submodule foreach or a cd into the submodule to operate on it as a standalone repository. > - Emily