Jeff King <peff@xxxxxxxx> writes: > Yeah, agreed. I started to fix this up with a use/unuse pattern and > realized something: all of the call sites are calling logmsg_reencode > anyway, because that is the next logical step in doing anything with the > buffer that is not just parsing out the parent/timestamp/tree info. And > since that function already might allocate (for the re-encoded copy), > callers have to handle the maybe-borrowed-maybe-free situation already. > > So I came up with this patch series, which I think should fix the > problem, and actually makes the call-sites easier to read, rather than > harder. > > [1/3]: commit: drop useless xstrdup of commit message > [2/3]: logmsg_reencode: never return NULL > [3/3]: logmsg_reencode: lazily load missing commit buffers > > Here's the diffstat: > > builtin/blame.c | 22 ++------- > builtin/commit.c | 14 +----- > commit.h | 1 + > pretty.c | 93 ++++++++++++++++++++++++++--------- > t/t4042-diff-textconv-caching.sh | 8 +++ > 5 files changed, 85 insertions(+), 53 deletions(-) > > Not too bad, and 27 of the lines added in pretty.c are new comments > explaining the flow of logmsg_reencode. So even if this doesn't get > every case, I think it's a nice cleanup. This looks very good. I wonder if this lets us get rid of the hack in cmd_log_walk() that does this: while ((commit = get_revision(rev)) != NULL) { if (!log_tree_commit(rev, commit) && rev->max_count >= 0) rev->max_count++; ! if (!rev->reflog_info) { ! /* we allow cycles in reflog ancestry */ free(commit->buffer); commit->buffer = NULL; ! } free_commit_list(commit->parents); commit->parents = NULL; After log_tree_commit() handles the commit, using the buffer, we discard the memory associated to it because we know we no longer will use it in normal cases. The "do not do that if rev->reflog_info is true" was added in a6c7306 (--walk-reflogs: do not crash with cyclic reflog ancestry, 2007-01-20) because the second and subsequent display of "commit" (which happens to occur only when walking reflogs) needs to look at commit->buffer again, and this hack forces us to retain the buffer for _all_ commit objects. But your patches could be seen as a different (and more correct) way to fix the same issue. Once the display side learns how to re-read the log text of the commit object, the above becomes unnecessary, no? We may still be helped if majority of commit objects that appear in the reflog appear more than once, in which case retaining the buffer for _all_ commits could be an overall win. Not having to read the buffer for the same commit each time it is shown for majority of multiply-appearing commits, in exchange for having to keep the buffer for commits that appears only once that are minority and suffering increasted page cache pressure. That could be seen as an optimization. But that is a performance thing, not a correctness issue, so "we allow cycles" implying "therefore if we discard the buffer, we will show wrong output" becomes an incorrect justification. I happen to have HEAD reflog that is 30k entries long; more than 26k represent a checkout of unique commit. So I suspect that the above hack to excessive retain commit->buffer for already used commits will not help us performance-wise, either. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html