On Fri, May 20, 2016 at 10:00 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Stefan Beller <sbeller@xxxxxxxxxx> writes: > >> Right. But upon finding the new name for clone, I wondered why >> this has to be submodule specific. The attr pathspecs are also working >> with any other files. So if you don't use submodules, I think it would be >> pretty cool to have a >> >> git clone --sparse-checkout=Documentation/ ... > > It would be cool, but arent' "sparse" and the various existing > status "submodule" has very different things? Yes they are. In one of the various "submodule groups" series I proposed a "defaultGroup" which allows commands to ignore some submodules. That was conceptually the very same as a "sparse checkout, just for submodules", i.e. the submodule is initialized and has a directory as a place holder, but most commands ignore its existence. We decided that was a bad thing, so now I think of a light weight "submodule.updateGroup" which holds a pathspec and is only used for "submodule update" commands that have no explicit pathspec given. (That setting would be set via "git clone --submodule-pathspec <pathspec>") > > - A submodule can be uninitialized, in which case you do get an empty > directory but you do not see .git in it. > > - A path can be excluded by the sparse checkout mechanism, in which > case you do not get _anything_ in the filesystem. Yes, but isn't that one of the minor issues? > > So "git clone --sparse-exclude=Documentation/" that does not waste > diskspace for Documentation/ directory may be an interesting thing > to have, and "git clone --sparse-exclude=submodule-dir/" that does > not even create submodule-dir/ directory may also be, but the latter > is quite different from a submodule that is not initialied in a > superproject that does not use any "sparse" mechanism. > > Besides, I think (improved) submodule mechanism would be a good way > forward for scalability, and "sparse" hack is not (primarily because > it still populates the index fully with 5 million entries even when > your attention is narrowed only to a handful of directories with > 2000 leaf entries; this misdesign requires other ugly hacks to be > piled on, like untracked cache and split index). > > I do not think we want "submodule" to be tied to and dependent on > the latter. Ok I just wanted to probe how much resistance I get here as an indicator of how much more work that would be. Besides I think (improved) sparse mechanism would be a good way to not confuse users between submodule scalability and single repo scalability. ;) We don't have to keep 5 million things in the index there, but we can stop on the tree/directory level, i.e. if a whole directory is excluded That's all we'd need to keep a record of, no? As a user I'd prefer to be exposed to as few concepts as possible, and adding yet another concept of sparseness is not a good thing IMHO, so I'll try to keep it simple there. Thanks, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html