On Wed, Aug 24, 2016 at 12:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jeff King <peff@xxxxxxxx> writes: > > This seems to be dropped from the list, probably due to no "To:" > header in the original, which led to "no", "To-header" "on" and > "input <" on YOUR recipient list, so I am quoting it in full without > trimming. > >> On Wed, Aug 24, 2016 at 10:30:17AM -0700, Stefan Beller wrote: >> >>> When working with submodules, it is easy to forget to push the submodules. >>> The setting 'check', which checks if any existing submodule is present on >>> at least one remote of the submodule remotes, is designed to prevent this >>> mistake. >>> >>> Flipping the default to check for submodules is safer than the current >>> default of ignoring submodules while pushing. >> >> It is safer, and that's good. But it's also slower, because it requires >> an extra traversal of all of the pushed commits. And now people will >> have to pay the price even if they are not using submodules at all. >> >> For instance, try this from a checkout of linux.git: >> >> for i in no check; do >> rm -rf dst.git >> git init --bare dst.git >> echo "==> Pushing with submodules=$i" >> time git push --recurse-submodules=$i dst.git HEAD >> done >> >> The second case takes 30-40 seconds longer. This is a full push of >> history, so it's an extreme case[1], but it's still rather unfortunate. >> >> Can we tie this default to some sign that submodules are actually in >> use? I don't think the presence of .gitmodules is perfect (because you >> might be in a bare repo, for example, and have just fetched some other >> history you are relaying), but it may be a good compromise. I'm >> envisioning something like "--recurse-submodules=auto-check" which >> auto-detects common situations (e.g., presence of .gitmodules or >> .git/modules checkouts) and enables "check", and then setting the >> default to that in the long run. >> >> -Peff >> >> [1] Actually, there is another much worse case lurking there. Try: >> >> git push --recurse-submodules=check --mirror dst.git >> >> from the kernel. I didn't let it finish, but I'd estimate it would >> take on the order of 5 hours. The problem is that push feeds each >> updated ref tip to find_unpushed_submodules(), so we end up walking >> over the same history over and over. >> >> I think it should feed all of the "before" and "after" ref tips it >> proposes to update to a _single_ revision traversal. > > That sounds massively ... broken. So before even thinking about > flipping it to default, this needs to be fixed first. > I agree. That sounds bad. However having the --auto-check feels like papering over the actual problem which to me sounds like a design problem. However this may be a viable short term solution. About the design issue: We need to answer the question: Which submodule commits are referenced by a given set of superproject commits. This question is advancing a very similar question that we'd have to ask in git-gc. In gc we would end up without having to worry about a specific set, but rather the all reachable commits of the superproject are in the given set. So we could solve two issues at the same time if we had a quick way to answer this question quickly. And for that I considered introducing a ref in the submodule (e.g.) refs/superproject/gc_protector, which records both the superprojects commit as well as the submodule commit in case the superproject changed the submodule pointer. I have some rough patches for this, but I did not consider sending that to the list as the patches are still rough and not quite fully thought out on the design level. -- Actually screw this. After an office discussion with Jonathan (cc'd): 1) We need to have an "alternate refs namespace", which contains secondary data (as this can be derived from the primary data with lots of compute). So maybe we need a file similar to the packed refs file, "superproject-refs" that behaves as if it is storing refs, but that file is safe to delete as we know all its contents can be recreated. 2) In this new alternate refs space, we can have refs/superproject/<sha1> in the submodule with the sha1 of the superproject that points to a commit which is a dirty merge of all submodule commits, that have no other parents. e.g. In the superproject we might have a history of: A <- B <- C with A being origin/master and C our local master. In the submodule we have: D <- E <- F \ G F is the master branch in the submodule, G is a dangling commit. Now we could have the following git links: A -> D B -> G C -> F Then the refs/superproject/<C> would be a merge of F and G. When pushing in the superproject (A..C) we would need to make sure the submodule is updated as well, i.e. we'd look at refs/superproject/<C> to know we need to advance the remote submodule history to contain at least G and F. Then we can do a standard rev walk starting at G and F downwards until we hit a commit that is contained in remote/refs/*. > Why wasn't it discovered that "push.recursesubmodules = check" is > UNUSABLE since it was introduced is simply beyond me. Maybe it is not compatible with your workflow as you push only once a day? In a centralized server model you rarely exceed a few commits on push time, but $ git rev-list a829490...c08db5a |wc -l 23 $ git rev-list f5df0f2...f3c0fa5 |wc -l 61 $ git rev-list 8f73021...de36215 |wc -l 95 and some new branches, so you pushed around 200 commits. I think this misbehaves strongly for typical maintainer work flows. > I am mad (which is unusual for me, isn't it? -- I've seen somebody > else saying "I am mad", but I do not think I ever said it here). Please direct your anger at the design of submodules. The assumption of the submodule being sort of 'independant' doesn't allow for efficient data structures I would think. So maybe we need to teach the submodules about the existence and state of the superproject? /me ducks -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html