Hi all, Thanks to my employer's generous "Hack Week" policy[0], I have the luxury of being able to spend most of next week hacking on a git commit dependency inference tool which I built 14 months ago but never got round to polishing up or publically announcing. In this email I'll briefly explain the tool and some ideas I have for adding a web-based UI to it next week - any feedback is most welcome. [0] https://hackweek.suse.com/ Background theory ================= It is fairly clear that two git commits within a single repo can be considered "independent" from each other in a certain sense, if they do not change the same files, or if they do not change overlapping parts of the same file(s). In contrast, when a commit changes a line, it is "dependent" on not only the commit which last changed that line, but also any commits which were responsible for providing the surrounding lines of context, because without those previous versions of the line and its context, the commit's diff might not cleanly apply[1]. So all dependencies of a commit can be programmatically inferred by running git-blame on the lines the commit changes, plus however many lines of context make sense for the use case of this particular dependency analysis. Therefore the dependency calculation is impacted by a "fuzz" factor (c.f. patch(1)) parameter, i.e. the number of lines of context which are considered necessary for the commit's diff to cleanly apply. As with many dependency relationships, these dependencies form edges in a DAG (directed acyclic graph) whose nodes correspond to commits. Note that a node can only depend on a subset of its ancestors. [1] Depending on how it's being applied, of course. Motivation ========== Sometimes it is useful to understand the nature of parts of this DAG, as its nature will impact the success or failure of operations including merge, rebase, cherry-pick etc. For example when porting a commit "A" between git branches via git cherry-pick, it can be useful to programmatically determine in advance the minimum number of other dependent commits which would also need to be cherry-picked to provide the context for commit "A" to cleanly apply. Another use case might be to better understand levels of specialism / cross-functionality within an agile team. If I author a commit which modifies (say) lines 34-37 and 102-109 of a file, the authors of the dependent commits forms a list which indicates the group of people I should potentially consider asking to review my commit, since I'm effectively changing "their" code. Monitoring those relationships over time might shed some light on how agile teams should best coordinate efforts on shared code bases. I'm sure there are other use cases I haven't yet thought of. At first I thought that it might provide a useful way to programmatically predict whether operations such as merge / rebase / cherry-pick would succeed, but actually it's probably cheaper and more reliable simply to perform the operation and then roll back. BTW the dependency graph is likely to be semantically incomplete; for example it would not auto-detect dependencies between a commit A which changes code and another commit B which changes documentation or tests to reflect the code changes in commit A. (Although of course it's usually best practice to logically group such changes together in a single commit.) But this should not stop it from being useful. Current status ============== I have written a tool called git-deps which automatically walks this graph: https://github.com/aspiers/git-config/blob/master/bin/git-deps I haven't yet documented it or formally announced it until now, but it's a single Python script, and usage is fairly self-explanatory: $ git deps -h usage: git-deps [options] COMMIT-ISH [COMMIT-ISH...] Auto-detects commits which the given commit(s) depend on. optional arguments: -h, --help show this help message and exit -l, --log Show commit logs for calculated dependencies [False] -r, --recurse Follow dependencies recursively [False] -e COMMITISH, --exclude-commits COMMITISH Exclude commits which are ancestors of the given COMMITISH (can be repeated) -c NUM, --context-lines NUM Number of lines of diff context to use [1] -d, --debug Show debugging [False] By default it will list all dependencies of the given commit-ish(s), but with --recurse it will one dependency (i.e. two SHA1s representing a graph edge) per line. There is still plenty of scope for optimization, e.g. it only takes partial advantage of pygit2. Future plans ============ 1. Interactive graph visualization Currently the output is text only, but I think it would be more useful to visualise the dependencies as an interactive graph where you could zoom in/out, pan around, hover over each node to see commit meta-data, click on leaf nodes to request further recursion, and so on. Nodes could be coloured according to commit author, and sized according to the diffstats. Clearly this should be cross-platform and based on some modern rendering technology, so HTML/CSS/Javascript seems the obvious choice. Dependency inference is too expensive to generate the full graph as a static web page, so I plan to extend the tool so it can act as a lightweight web server, e.g. $ git deps --web --port 8080 and then you could simply point your browser at http://localhost:8080 to interact with the graph. It might look a little like this: http://marvl.infotech.monash.edu/webcola/examples/downwardedges.html but with interactive zoom/pan/hover/click functionality like this: http://marvl.infotech.monash.edu/webcola/examples/onlinebrowse.html Since a lot of the hard work is already done by cola.js in the above examples, most likely I will use that in conjunction with d3.js for rendering: http://marvl.infotech.monash.edu/webcola/ Since the tool is already written in Python, I am considering using a very lightweight web framework such as Flask: http://flask.pocoo.org/ (I suspect Django would be overkill for this application which is essentially stateless.) Another approach might be to integrate it into an existing git web frontend written in Python. However I trawled through https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Web_Interfaces but couldn't find any Python-based frontend which looked like it was in active development. Perhaps the most promising I could find was: http://git.kaarsemaker.net/goblet/ 2. Performance improvements The tool should make better use of pygit2, since blame support was not complete when it was originally written. It also still forks git-merge-base. 3. Documentation 4. Tests Yes - embarrassing to admit I wrote this as a quick hack without following TDD. In my defence, I was doing it in coffee breaks at an openSUSE conference ;-p Request for feedback ==================== Any kind of feedback is very welcome - obviously sooner rather than later, as my Hack Week starts on Monday. Here's the project page: https://hackweek.suse.com/11/projects/366 Many thanks in advance! Adam -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html