Phillip Wood <phillip.wood123@xxxxxxxxx> writes: >> @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options) >> DIFF_QUEUE_CLEAR(q); >> if (options->close_file) >> fclose(options->file); >> + for (i = 0; i < options->ignore_regex_nr; i++) { >> + regfree(options->ignore_regex[i]); >> + free(options->ignore_regex[i]); >> + } >> + free(options->ignore_regex); > > If I run `git log -p -I foo` then the address sanitizer reports > > AddressSanitizer: heap-use-after-free xdiff/xdiffi.c:1027 in > record_matches_regex > > after it has printed the diff for the first commit. I think freeing > the regex here is the cause of the problem. Yes, this is a good location to clear the diff_queue, which is per-invocation. But diff_options->ignore_regex[] can and should be shared across multiple invocations of the diff machinery, as we are parsing upfront in the command line option parser just once to be used throughout the life of the program. It is wrong to free them early like this. I really hate to suggest this, one common API call made into the diff machinery from various consumers, when they are absolutely done with diff_options, is probably diff_result_code(). Its purpose is to collect bits computed to participate in the overall result into the final result code and anybody who ran one or more diff and wants to learn the outcome must call it, so it is unlikely that callers that matter are missing a call to finalize their uses of the diff machinery. So if freeing .ignore_regex[] is really necessary, it could be used to tell us where to do so [*1*]. Unlike per-invocation resources that MUST be freed for repeated invocations of the diff machinery made in "git log" family, resources held for and repeatedly used during the entire session like this may not have to be freed, to be honest, though. Thanks. [Footnote] *1* Adding calls to free() in diff_result_code() directly is probably not a good idea. Some callers may not even need result code (e.g. "blame") and do not call it at all, but we may want to free the resource in diff_options that is not per-invocation. So if we were to do this, we probably need a new API function, perhaps diff_options_clear() or something, and call that from diff_result_code(), and then find callers that do not care about the result code and call diff_options_clear() directly from them. If a caller does a single diff_setup_done(), makes repeated calls to the diff machinery, and calls diff_result_code() once per each iteration (e.g. imagine "git log -p" that detects the status for each commit), then having the diff_options_clear() call in diff_result_code() will break the machinery, so if we were to follow the outline given in the previous paragraph, we need to make sure there is no such caller. But I see that diff_result_code() does not clear the fields it looks at in the diff_options fields after it uses them, so the second and subsequent calls into the diff machinery by such a caller may already be broken (not in the "resource leakage" sense, but in the correctness sense).