On Mon, Jun 19, 2017 at 1:20 PM, Yaroslav Halchenko <yoh@xxxxxxxxxxxxxx> wrote: > > On Mon, 19 Jun 2017, Stefan Beller wrote: > >> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko <yoh@xxxxxxxxxxxxxx> wrote: >> > Hi All, > >> > On a recent trip I've listened to the git minutes podcast episode and >> > got excited to hear Stefan Beller (CCed just in case) describing >> > ongoing work on submodules mechanism. I got excited, since e.g. >> > performance improvements would be of great benefit to us too. > >> If you're mostly interested in performance improvements of the status >> quo (i.e. "make git-submodule fast"), then the work of Prathamesh >> Chavan (cc'd) might be more interesting to you than what I do. >> He is porting git-submodule (which is mostly a shell script nowadays) >> to C, such that we can save a lot of process invocations and can do >> processing within one process. > > ah -- cool. I would be eager to test it out, thanks! would be > interesting to see if it positively affects our overall performance. > Pointers to that development would be welcome! The latest from today: https://public-inbox.org/git/CAME+mvUQJFneV7b1G7zmAidP-5L=nimvY43V0ug-Gtesr83tzg@xxxxxxxxxxxxxx/ > >> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370 >> > repositories, up to 4 levels deep) hierarchy of git/git-annex >> > repositories all tied together via git submodules mechanism. And as the >> > collection grows, interactions with it become slower, so additional >> > options (such as --ignore-submodules=dirty to status) become our >> > friends. > >> I am not as much concerned about the 370 number than about the >> 4 layers of nesting. In my experience the nested submodule case >> is a little bit error prone and the bug reports are not as frequent as >> there are not as many users of nesting, yet(?) > > well -- part of the story here is that we are forced to use/have full > blown .git/ directories (for git-annex symlinks to content files to > work) within submodules instead of .git file with a reference under > parent's .git/modules. So we can 'slice' at any level and I > guess that is why may be avoiding some possibly issues due to nesting > and the "parent has all .git/modules" approach. That sounds like you either want to configure to have the submodules git dirs in-place or you want to convince git-annex to learn about the gitdir pointer files. > >> In a neighboring thread on the mailing list we have a discussion >> on the usefulness of being on branches than in detached HEAD >> in the submodules. >> https://public-inbox.org/git/0092CDD27C5F9D418B0F3E9B5D05BE08010287DF@xxxxxxxxxxxxxxxxxxxxxxxx/ > >> This would not break non-ambiguously, rather it would add >> ease of use. > > that is indeed a common caveat... I am not sure if any heuristic > approach would provide a 'bullet proof' solution. I might even prefer a > hardcoded 'branch-name' to be listed/associated with each submodule > within .gitmodules. hardcoded as submodule.NAME.branch, maybe? https://git-scm.com/docs/gitmodules > In the datalad case, detached HEAD is common So you are accustomed to detached HEADs and would not gain much from being back on a branch? That's cool, too. > whenever someone installs "outdated" (branch of which progressed > forward) submodule. In this case we just check if the branch after "git > clone" (but before git submodule update) includes the pointed by > Subproject commit, and if so -- we announce that it must be the branch > (so far it is always "master" branch anyways ;) ) heh, having just one branch. That is retro-style. :)