Hi David, it is good practice to Cc: the original author of the code in question, in this case Junio. I guess he sees this anyway, but that is really just an assumption. On Fri, 27 May 2016, David Kastrup wrote: > When a parent blob already has chunks queued up for blaming, dropping > the blob at the end of one blame step will cause it to get reloaded > right away, doubling the amount of I/O and unpacking when processing a > linear history. It is obvious from your commit message that you have studied the code quite deeply. To make it easier for the reader (which might be your future self), it is advisable to give at least a little bit of introduction, e.g. what the "parent blob" is. I would *guess* that it is the blob corresponding to the same path in the parent of the current revision, but that should be spelled out explicitly. > Keeping such parent blobs in memory seems like a reasonable > optimization. It's conceivable that this may incur additional memory This sentence would be easier to read if "It's conceivable that" was simply deleted. > pressure particularly when the history contains lots of merges from > long-diverged branches. In practice, this optimization appears to > behave quite benignly, Why not just stop here? I say that because... > and a viable strategy for limiting the total amount of cached blobs in a > useful manner seems rather hard to implement. ... this sounds awfully handwaving. Since we already have reference counting, it sounds fishy to claim that simply piggybacking a global counter on top of it would be "rather hard". > In addition, calling git-blame with -C leads to similar memory retention > patterns. This is a red herring. Just delete it. I, for one, being a heavy user of `git blame`, could count the number of times I used blame's -C option without any remaining hands. Zero times. Besides, -C is *supposed* to look harder. By that argument, you could read all blobs in rev-list even when the user did not specify --objects "because --objects leads to similar memory retention patterns". So: let's just forget about that statement. The commit message is missing your sign-off. Also: is there an easy way to reproduce your claims of better I/O characteristics? Something like a command-line, ideally with a file in git.git's own history, that demonstrates the I/O before and after the patch, would be an excellent addition to the commit message. Further: I would have at least expected some rudimentary discussion why this patch -- which seems to at least partially contradict 7c3c796 (blame: drop blob data after passing blame to the parent, 2007-12-11) -- is not regressing on the intent of said commit. > diff --git a/builtin/blame.c b/builtin/blame.c > index 21f42b0..2596fbc 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -1556,7 +1556,8 @@ finish: > } > for (i = 0; i < num_sg; i++) { > if (sg_origin[i]) { > - drop_origin_blob(sg_origin[i]); > + if (!sg_origin[i]->suspects) > + drop_origin_blob(sg_origin[i]); > origin_decref(sg_origin[i]); > } It would be good to mention in the commit message that this patch does not change anything for blobs with only one remaining reference (the current one) because origin_decref() would do the same job as drop_origin_blob when decrementing the reference counter to 0. In fact, I suspect that simply removing the drop_origin_blob() call might result in the exact same I/O pattern. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html