On April 20, 2021 2:48 PM, Emily Shaffer wrote: > On Tue, Apr 20, 2021 at 09:18:05AM -0700, Jacob Keller wrote: > > > > On Mon, Apr 19, 2021 at 12:28 PM Randall S. Becker > > <rsbecker@xxxxxxxxxxxxx> wrote: > > > On April 19, 2021 3:15 PM, Jacob Keller wrote: > > > > A sort of dream I had was a flow where I could do something from > > > > the parent like "git blame <path/to/submodule>/submodule/file" and > > > > have it present a blame of that files contents keyed on the > > > > *parent* commit that changed the submodule to have that line, as > > > > opposed to being forced to go into the submodule and figure out > > > > what commit introduced it and then go back to the parent and find > > > > out what commit changed the submodule to include that submodule > commit. > > > > > > Not going to disagree, but are you looking for the blame on the submodule > ref file itself or files in the submodule? It's hard to teach git to do a blame on a > one-line file. > > > > > > > Well, I would like if "git blame <path/to/submodule>" did.. something > > other than just fail. Sometimes my brain is working in a "blame where > > this came from" and I type that out and then get frustrated when it > > fails. Additionally... > > > > > Otherwise, and I think this is what you really are going for, teaching it to do > a blame based on "git blame <path/to/submodule>/submodule/file" would be > very nice and abstracts out the need for the user (or more importantly to me = > scripts) to understand that a submodule is involved; however, it is opening up a > very large door: "should/could we teach git to abstract submodules out of every > command". This would potentially replace a significant part of the use cases for > the "git submodule foreach" sub-command. In your ask, the current paradigm > "cd <path/to/submodule>/submodule && git blame file" or pretty much every > other command does work, but it requires the user/script to know you have a > submodule in the path. So my question is: is this worth the effort? I don't have a > good answer to that question. Half of my brain would like this very much/the > other half is scared of the impact to the code. > > > > > > Just my musings. > > > > I'm not asking for "git blame <path/to/submodule>/<file>" to give the > > the same outout as "cd <path/to/submodule> && git blame <file>" > > > > What i'm asking is: given this file, tell me which commit in the > > parent did the line get introduced. So basically I want to walk over > > the changes to the submodule pointer and find out when it get > > introduced into the parent, not when it got introduced into the > > submodule itself. > > > > This is a related question, but it is actually not trivial to go > > instantly from "it was in xyz submodule commit" to "it was then pulled > > in by xyz parent commit". It's something that is quite tedious to do > > manually, especially since the submodule pointer could change > > arbitrarily so knowing the submodule commit doesn't mean you can > > simply grep for which commit set the submodule exactly to that commit. > > Essentially, I want a 'git blame' that ignores all changes which > > aren't actually the submodule pointer, update. > > > > I think that's something that is much harder to do manually, but feels > > like it should be relatively simple to implement within the blame > > algorithm. I don't feel like this is something strictly replaceable by > > "git submodule foreach" > > I think I understand what you're saying. Something like the following > tree: > > super sub > b------->4 > 3 > 2 > a------->1 > > producing something like this: > > 'git -C sub blame main.c' > > 1 AU Thor 2020-01-01 > 2 CO Mitter 2020-01-02 int main() { > 4 AU Thor 2020-01-04 printf("Hello world!\n"); > 3 Dev E 2020-01-03 return 0; > 2 CO Mitter 2020-01-02 } > > and > 'git blame sub/main.c' > > a Mai N 2020-01-01 > b Senior Dev 2020-01-04 int main() { > b Senior Dev 2020-01-04 printf("Hello world!\n"); > b Senior Dev 2020-01-04 return 0; > b Senior Dev 2020-01-04 } > > or to put it another way: if we are treating superproject commit as "the whole > feature", then it could be useful to see "which feature added this change" > instead of "which atomic commit inside a feature added this change". > > To me, it sounds expensive to compute... wouldn't you need to say, for each > blame line, "is this commit an ancestor of the commit associated in THIS > superproject commit? ...how about the next superproject commit?" > But I also don't have much experience with the blame implementation so > maybe I'm thinking naively :) :) > > And even if it is expensive, considering that Jacob and Randall both had > different ideas of what their ideal 'git blame' recursive behavior would be, > maybe it makes sense to use a flag to ask for the more expensive behavior, e.g. > 'git blame --show-superproject-commit sub/main.c'? I was partly trying to figure out which path Jacob was requesting and "both" seem useful to me. Looking at our own super-repo history and comparing what is in one specific submodule, we have the commit on the submodule ref file flipping repeatedly between two commits during a period of time (in timestamp order) where an there were multiple submodule topic branches in parallel with super-repo topic branches. I'm not saying that was a good thing, just the reality of one particular icky submodule. Once back on the main branch, things moved to something rational, but a blame of changing submodule contents of sub/main.c --show-superproject-commit would lead to something inherently non-deterministic until the topic branches are all pruned and/or merged, at least in this degenerative situation. After the merge, everything was back to deterministic and simple to compute. Randall