2018-03-08 21:29 GMT+02:00 Eddy Petrișor <eddy.petrisor@xxxxxxxxx>: > 2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@xxxxxxxxx>: >> >> (cc list snipped) >> Hi, >> >> Eddy Petrișor wrote: >> >> > Cc: [a lot of people] >> >> Can you say a little about how this cc list was created? E.g. should >> "git send-email" get a feature to warn about long cc lists? > > > I did it as advised by the documentation, git blame: > > https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L264 > > I was suprised that there is no specialized script that does this, as > it is for the kernel or u-boot, so I ran first > > git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u > > mail-list-submodule > > then configure git to use that custom output and ignore the patch > since it was trying to convert every line of it into a email address: > > git config sendemail.ccCmd 'cat mail-list-submodule # ' > > Then "git send-email 0001..." did the rest. > > >> >> > Signed-off-by: Eddy Petrișor <eddy.petrisor@xxxxxxxxx> >> > --- >> > >> > There are projects such as llvm/clang which use several repositories, and they >> > might be forked for providing support for various features such as adding Redox >> > awareness to the toolchain. This typically means the superproject will use >> > another branch than master, occasionally even use an old commit from that >> > non-master branch. >> > >> > Combined with the fact that when incorporating such a hierachy of repositories >> > usually the user is interested in just the exact commit specified in the >> > submodule info, it follows that a desireable usecase is to be also able to >> > provide '--depth 1' to avoid waiting for ages for the clone operation to >> > finish. >> >> Some previous discussion is at >> https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=ENy5aJBQf9_QA@xxxxxxxxxxxxxx/. >> >> In theory this should be straightforward: Git protocol allows fetching >> an arbitrary commit, so "git submodule update" and similar commands >> could fetch the submodule commit by SHA-1 instead of by refname. Poof! >> Problem gone. >> >> In practice, some complications: >> >> - some servers do not permit fetch-by-sha1. For example, github does >> not permit it. This is governed by the >> uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant >> configuration items. > > > That is the problem I have faced since I tried to clone this repo > which has at lest 2 levels of submodules: > https://github.com/redox-os/redox > > The problematic modules are: > rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3 > - first level > > and > > rust/src/llvm @ > https://github.com/rust-lang/llvm/tree/ba2edd794c7def715007931fcd1b4ce62aa711c8 > > >> >> That should be surmountable by making the behavior conditional, but >> it's a complication. > > > Which forced me to try to fetch a branch on which that commit exists, > but also consider providing --depth for the submodule checkout, > effectively minimizing the amount of brought in history. > >> >> >> - When the user passes --depth=<num>, do they mean that to apply to >> the superproject, to the submodules, or both? Documentation should >> make the behavior clear. > > > The supermodule clone already exists and that works OK; I was after > providing something like 'git submodule update --depth 20 --recursive' > or evn providing that 'depth' info via the .gitmodules parameters > since 'shallow' is already used somehow, yet that is a bool value, not > an integer, like depth expects: > > > eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules > --list | egrep '(depth|shallow)' > submodule.src/llvm.shallow=true > submodule.src/rt/hoedown.shallow=true > submodule.src/jemalloc.shallow=true > submodule.src/liblibc.shallow=true > submodule.src/doc/nomicon.shallow=true > submodule.src/tools/cargo.shallow=true > submodule.src/doc/reference.shallow=true > submodule.src/doc/book.shallow=true > submodule.src/tools/rls.shallow=true > submodule.src/libcompiler_builtins.shallow=true > submodule.src/tools/clippy.shallow=true > submodule.src/tools/rustfmt.shallow=true > submodule.src/tools/miri.shallow=true > submodule.src/dlmalloc.shallow=true > submodule.src/binaryen.shallow=true > submodule.src/doc/rust-by-example.shallow=true > submodule.src/llvm-emscripten.shallow=true > submodule.src/tools/rust-installer.shallow=true > > >> > Git submodule seems to be very stubborn and cloning master, although the >> > wrapper script and the gitmodules-helper could work together to clone directly >> > the branch specified in the .gitmodules file, if specified. >> >> This could make sense. For the same reason as --depth in the >> superproject gives ambiguous signals about what should happen in >> submodules, --single-branch in the superproject gives ambiguous >> signals about what branch to fetch in submodules. > > > I never thought of providing the branch in the command line, since > that's not versionable info, but to provide git-submodule a hint in > the .gitsubmodule config about on which branch the hash in question > should be found, like this: > > eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules > --list | egrep branch > submodule.src/llvm.branch=rust-llvm-release-4-0-1 > submodule.src/rt/hoedown.branch=rust-2015-09-21-do-not-delete > >> >> > Another wrinkle is that when the commit is not the tip of the branch, the depth >> > parameter should somehow be stored in the .gitmodules info, but any change in >> > the submodule will break the supermodule submodule depth info sooner or later, >> > which is definitly frigile. >> >> Hm, this seems to go in another direction. I don't think we should >> store the depth parameter in the .gitmodules file, since different >> users are likely to have different preferences about what to make >> shallow. If we make --depth easy enough to use at the superproject >> level then the user can specify what they want there. > > > Yes, but > a) if the commit is not on the tip of that branch, > b) the server does not support sha1 fetching and > c) it is not expected to typically work in the submodules from the > superproject checkout (e.g submodule is under another team's control) > > the typical usecase could be that the code is question is read-only > and it should be OK to typically fetch a shallow clone of the > submodule (by default). The only compromise at that point is to either > choose a afe margin for the defalt depth in case the module branch is > being modified. > > I solved a similar problem at work where having 2 shallow branches > which had to be merged; since when trying to merge two branches > without apprent common history git will do a merge commit, when a > simple ff woudl wor, I wanted to find the merge-base. > SInce the brances were shallow and they had to have a merge-base, the > solution was to was to simply increase the --depth parameter in > predifined steps (of 10 commits) and fetch both branches, check if the > 'git merge-base' command manged to find a commit; if not the depthc > parameter woudl be increased and the whole process would be started > until that merge-base was found. Another optimization could be for submodule.<path>.shallow to imply --single-branch, too. > I wonder if the same couldn't be done for the submodules that have > submodule.<dir>.shallow setting, making the depth parameter hardcoding > unnecessary, and guranteeing the submodule could be cloned at any > given time, as long as the branch specified by submodule.<dir>.branch > is not removed from the submodule repo. > >> >> > I tried digging into this section of the code and debugging with bashdb to see >> > where --depth might fit, but I got stuck on the shell-to-helper interaction and >> > the details of the submodule implementation, so I want to lay out this first >> > patch as starting point for the discussion in the hope somebody else picks it >> > up or can provide some inputs. I have the feeling there are multiple code paths >> > that are being ran, depending on the moment (initial clone, submodule >> > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be >> > any code duplication just because the operation is different. >> > >> > This first patch is only trying to use a non-master branch, I have some changes >> > for the --depth part, but I stopped working on it due to the "default depth" >> > issue above. >> > >> > Does any of this sound reasonable? >> > Is this patch idea usable or did I managed to touch the part of the code that >> > should not be touched? >> >> I agree with the goal. As mentioned above, I'm not confident about >> the particular mechanism --- e.g. something using fetch-by-sha1 seems >> likely to be more intuitive. >> >> Today, the 'branch' setting in .gitmodules is only for "git submodule >> update --remote". This patch would be a significant expansion in >> scope for it. Hopefully others on the list can talk more about how >> that fits into various workflows and whether it would work out well. > > > Thanks for the input, it was helpful anyway. > > Now that I think of it, I think the "fetch more and more from the > branch until we find the sha1" could work; in the worst case you would > end up going to the beginning of the repo in case the submodule > history was rewritten or if the branch specified in the supermodule > submodule.<path>.branch was removed. Actually, the branch removal in the submodule would lead to a failure to fetch from the get go, so the worst case is if history is rewritten because the sha1 would be missing fromt he branch. Antoher non-fatal perfomance issue would be if the desired commit is very close to the initial commit of the repo. > I am aware this is not ellegant at all (what is the default "step"?), > yet the submodule part of the code already struggles in that area > because it clearly looks like an afterthought, anyway. -- Eddy Petrișor