On Sun, 26 Nov 2006, Junio C Hamano wrote: > > I think "networking" vs "packet filtering" largely depends on > how the networking subsystem you pull from is managed. If > netfilter comes as e-mailed patches to DaveM and are applied > onto the trunk of networking subsystem, we will face exactly the > same problem as we have with Andrew's patchbomb to your trunk. Most of the subsystems end up using patches - they're simply better ways to move things around and have people comment on them than saying "please pull on this tree to see my suggestion". I do it myself: even when I _generate_ the diff in my tree, I will often just do a git diff > ~/diff and then import the thing into my mailer, and say "Maybe something like this?". So I think patches are fundamentally the core way to get things in the periphery into just about any system. Maybe we do it more than most just because we're so _used_ to them, but I actually think that if the kernel does it more than most (and I'm not sure it does), it's simply because the thing about patches is that they really _work_. So yes, the network subsystem tends to be entirely linear by the time it hits me. That's true of a lot of other subsystems too (SCSI etc). There's a _few_ subsystems that actually have real topic branches: ACPI and network driver development comes to mind, but it seems to actually be the exception rather than the rule. (I think that a lot of people work like I occasionally do: they do have their own local branches for some stuff, but they end up re-linearizing and keeping them active with "git rebase", so the branches really are purely local, rather than something that is visible in the end result). But the REAL reason I'd love to see a smarter "data-mining" git log (whether it does things by bayesian clustering or any other kind of grouping technology) is that this is actually something that people ask for: when I make my "git shortlog" for major releases, the thing is often thousands of lines long, and it would be _beautiful_ if that could be data-mined somewhat more intelligently. So, for example, do a simple git shortlog v2.6.17..v2.6.18 (with the shortlog in "next" that can do this - btw, why doesn't it default to using PAGER like "git log" does?), and realize that it's about 8500 lines of stuff, and nobody can really be expected to read it. It's not a "shortlog" in other words. So what would a _nice_ "shortlog" do? I'd _love_ to see ways to make it more concise, more "short" for something like this. Look at the output as a _non_kernel_ person, and what does it tell you? Not a lot. It's just too big. Examples of what I think would be _really_ useful (much more so than going by "topic branches", even if they existed): - Clustering. The author-based clustering does work, but it would be even better to cluster by other methods ("subsystem" - either by subdirectory, or by noticing filename patters, or even patterns in the patches: there's a lot of academic work on clustering human text, perhaps not as much on clustering patches). - Shortening The "shortlog" often isn't. It's wonderful for small things as-is, but once it reaches a hundred lines or more, it's less so. It would often be nice to be able to say "only show the 100 biggest patches" (or preferably something smarter like "the 25 biggest clusters, with a short 4-line clustering explanation", but even just the "biggest patches" is useful in itself and much simpler) - External annotations (eventually) One of the things that people like LWN editor Jonathan Corbet would want is a way to say which patches are "important". But the thing is, "importance" is (a) fleeting and (b) not necessarily as obvious when the commit is made as it is afterwards. So you cannot (and must not) mark things "important" at commit-time, and it thus can't really be part of the repo itself, but at the same time, this is definitely something that _could_ be somehow logged/annotated externally. Now, I realize that these are all pipe-dreams, but so was my old "a better annotate than annotate" a year or two ago. So I'm not saying that people should work on this, I'm just saying that it's worth perhaps thinking about, because I think the git model does actually give us the power to _do_ things like this. Eventually. And the reason? Performance! Git is fast enough that we really _can_ afford to do things like "generate diffs for every single commit in the range v2.6.17..v2.6.18" and it takes me just 20 seconds to do on a reasonable machine with "git log -p". So good performance means that we can _afford_ to do a diffstat for everything (or, just raw diffs to make it even cheaper - quite often you care more about _which_ files and how many files something touched than the actual size of the diff in those files itself), and using that diffstat to some day generate shortlogs that are more useful for people like Jonathan Corbet and others that just want to get an overview of "what happened"? inus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html