2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@xxxxxxxxx>: > > (cc list snipped) > Hi, > > Eddy Petrișor wrote: > > > Cc: [a lot of people] > > Can you say a little about how this cc list was created? E.g. should > "git send-email" get a feature to warn about long cc lists? I did it as advised by the documentation, git blame: https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L264 I was suprised that there is no specialized script that does this, as it is for the kernel or u-boot, so I ran first git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u > mail-list-submodule then configure git to use that custom output and ignore the patch since it was trying to convert every line of it into a email address: git config sendemail.ccCmd 'cat mail-list-submodule # ' Then "git send-email 0001..." did the rest. > > > Signed-off-by: Eddy Petrișor <eddy.petrisor@xxxxxxxxx> > > --- > > > > There are projects such as llvm/clang which use several repositories, and they > > might be forked for providing support for various features such as adding Redox > > awareness to the toolchain. This typically means the superproject will use > > another branch than master, occasionally even use an old commit from that > > non-master branch. > > > > Combined with the fact that when incorporating such a hierachy of repositories > > usually the user is interested in just the exact commit specified in the > > submodule info, it follows that a desireable usecase is to be also able to > > provide '--depth 1' to avoid waiting for ages for the clone operation to > > finish. > > Some previous discussion is at > https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=ENy5aJBQf9_QA@xxxxxxxxxxxxxx/. > > In theory this should be straightforward: Git protocol allows fetching > an arbitrary commit, so "git submodule update" and similar commands > could fetch the submodule commit by SHA-1 instead of by refname. Poof! > Problem gone. > > In practice, some complications: > > - some servers do not permit fetch-by-sha1. For example, github does > not permit it. This is governed by the > uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant > configuration items. That is the problem I have faced since I tried to clone this repo which has at lest 2 levels of submodules: https://github.com/redox-os/redox The problematic modules are: rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3 - first level and rust/src/llvm @ https://github.com/rust-lang/llvm/tree/ba2edd794c7def715007931fcd1b4ce62aa711c8 > > That should be surmountable by making the behavior conditional, but > it's a complication. Which forced me to try to fetch a branch on which that commit exists, but also consider providing --depth for the submodule checkout, effectively minimizing the amount of brought in history. > > > - When the user passes --depth=<num>, do they mean that to apply to > the superproject, to the submodules, or both? Documentation should > make the behavior clear. The supermodule clone already exists and that works OK; I was after providing something like 'git submodule update --depth 20 --recursive' or evn providing that 'depth' info via the .gitmodules parameters since 'shallow' is already used somehow, yet that is a bool value, not an integer, like depth expects: eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules --list | egrep '(depth|shallow)' submodule.src/llvm.shallow=true submodule.src/rt/hoedown.shallow=true submodule.src/jemalloc.shallow=true submodule.src/liblibc.shallow=true submodule.src/doc/nomicon.shallow=true submodule.src/tools/cargo.shallow=true submodule.src/doc/reference.shallow=true submodule.src/doc/book.shallow=true submodule.src/tools/rls.shallow=true submodule.src/libcompiler_builtins.shallow=true submodule.src/tools/clippy.shallow=true submodule.src/tools/rustfmt.shallow=true submodule.src/tools/miri.shallow=true submodule.src/dlmalloc.shallow=true submodule.src/binaryen.shallow=true submodule.src/doc/rust-by-example.shallow=true submodule.src/llvm-emscripten.shallow=true submodule.src/tools/rust-installer.shallow=true > > Git submodule seems to be very stubborn and cloning master, although the > > wrapper script and the gitmodules-helper could work together to clone directly > > the branch specified in the .gitmodules file, if specified. > > This could make sense. For the same reason as --depth in the > superproject gives ambiguous signals about what should happen in > submodules, --single-branch in the superproject gives ambiguous > signals about what branch to fetch in submodules. I never thought of providing the branch in the command line, since that's not versionable info, but to provide git-submodule a hint in the .gitsubmodule config about on which branch the hash in question should be found, like this: eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules --list | egrep branch submodule.src/llvm.branch=rust-llvm-release-4-0-1 submodule.src/rt/hoedown.branch=rust-2015-09-21-do-not-delete > > > Another wrinkle is that when the commit is not the tip of the branch, the depth > > parameter should somehow be stored in the .gitmodules info, but any change in > > the submodule will break the supermodule submodule depth info sooner or later, > > which is definitly frigile. > > Hm, this seems to go in another direction. I don't think we should > store the depth parameter in the .gitmodules file, since different > users are likely to have different preferences about what to make > shallow. If we make --depth easy enough to use at the superproject > level then the user can specify what they want there. Yes, but a) if the commit is not on the tip of that branch, b) the server does not support sha1 fetching and c) it is not expected to typically work in the submodules from the superproject checkout (e.g submodule is under another team's control) the typical usecase could be that the code is question is read-only and it should be OK to typically fetch a shallow clone of the submodule (by default). The only compromise at that point is to either choose a afe margin for the defalt depth in case the module branch is being modified. I solved a similar problem at work where having 2 shallow branches which had to be merged; since when trying to merge two branches without apprent common history git will do a merge commit, when a simple ff woudl wor, I wanted to find the merge-base. SInce the brances were shallow and they had to have a merge-base, the solution was to was to simply increase the --depth parameter in predifined steps (of 10 commits) and fetch both branches, check if the 'git merge-base' command manged to find a commit; if not the depthc parameter woudl be increased and the whole process would be started until that merge-base was found. I wonder if the same couldn't be done for the submodules that have submodule.<dir>.shallow setting, making the depth parameter hardcoding unnecessary, and guranteeing the submodule could be cloned at any given time, as long as the branch specified by submodule.<dir>.branch is not removed from the submodule repo. > > > I tried digging into this section of the code and debugging with bashdb to see > > where --depth might fit, but I got stuck on the shell-to-helper interaction and > > the details of the submodule implementation, so I want to lay out this first > > patch as starting point for the discussion in the hope somebody else picks it > > up or can provide some inputs. I have the feeling there are multiple code paths > > that are being ran, depending on the moment (initial clone, submodule > > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be > > any code duplication just because the operation is different. > > > > This first patch is only trying to use a non-master branch, I have some changes > > for the --depth part, but I stopped working on it due to the "default depth" > > issue above. > > > > Does any of this sound reasonable? > > Is this patch idea usable or did I managed to touch the part of the code that > > should not be touched? > > I agree with the goal. As mentioned above, I'm not confident about > the particular mechanism --- e.g. something using fetch-by-sha1 seems > likely to be more intuitive. > > Today, the 'branch' setting in .gitmodules is only for "git submodule > update --remote". This patch would be a significant expansion in > scope for it. Hopefully others on the list can talk more about how > that fits into various workflows and whether it would work out well. Thanks for the input, it was helpful anyway. Now that I think of it, I think the "fetch more and more from the branch until we find the sha1" could work; in the worst case you would end up going to the beginning of the repo in case the submodule history was rewritten or if the branch specified in the supermodule submodule.<path>.branch was removed. I am aware this is not ellegant at all (what is the default "step"?), yet the submodule part of the code already struggles in that area because it clearly looks like an afterthought, anyway. > > Thanks and hope that helps, > Jonathan -- Eddy Petrișor