On Wed, Jan 23, 2013 at 04:25:01PM -0800, Junio C Hamano wrote: > > With the object cache, isn't modifying the object unsafe in > > general? Instead of auditing code paths, it's now necessary to > > audit _all_ code that uses "struct object", which seems > > infeasible. > > The object layer was designed with "run one thing and one thing > well, and then let the _exit(2) take care of the clean-up" model in > mind; modifying the object, e.g. updating commit->parents list, > in-core by revision traversal machinery is very much within the > scope of its intended uses. Yeah, although I think the "parsed" flag really is a bit overloaded for commits. Most code which uses "struct commit" will want the parents, timestamp, and tree information filled in. And so they call parse_commit to make sure that is the case. Afterwards, those fields are valid and the "parsed" flag is set. The buffer field may or may not be valid afterwards, depending on save_commit_buffer. Some other code may also care about the rest of the commit information. It also calls parse_commit, but with save_commit_buffer turned on. However, that may or may not actually initialize the buffer, depending on who has been using the object before us. It works in practice most of the time because we only perform one "task" per invocation, and that task either keeps the commit messages around the first time, or doesn't. But it is a bit of a ticking time bomb anytime we violate that assumption. I think it would be saner for callers who only care about ancestry info call to call parse_commit, and then anybody who is about to access the buffer to call a new ensure_commit_buffer function, which would make sure the buffer is accessible (even if save_commit_buffer is false). Then save_commit_buffer becomes just an optimization: save it for later during parsing, so we don't have to do it later. That would also let us optimize better if we end up with a commit ancestry cache which can do parse_commit much cheaper than accessing the full object. For example, my commit cache patches hook into parse_commit, but they can only do so safely when save_commit_buffer is false. So git-rev-list benefits, but git-log does not. However, if we knew that log would lazily load the commit data when needed, we could use the cache there, too. It would not be a win if you are showing every commit anyway, but if you have limiting that does not depend on the commit object itself (e.g., path limiting in the tree), you could avoid loading some commits entirely. > > I'm just trying to fix the segfault demonstrated in the test > > attached to the patch. > > Can offending readers that dereference NULL without checking if > buffer has been freed be updated so that they read_sha1_file(), read > the contents from the result returned from the function (instead of > reading from .buffer), and free the memory when they are done? Looks like builtin/blame.c:get_commit_info does this already, although it leaves the buffer attached to the commit and does not free it. The ensure_commit_buffer function could look something like: int ensure_commit_buffer(struct commit *item) { enum object_type type; unsigned long size; if (!item) return -1; if (!item->object.parsed) return parse_commit(item); if (item->buffer) return 0; item->buffer = read_sha1_file(item->object.sha1, &type, &size); if (!item->buffer) return error("Could not read %s", sha1_to_hex(item->object.sha1); return 0; } and blame could replace its in-line processing with that. It does leave open the question of whether callers should free() the result. But that would be up to each user of the object (and it would be an optimization issue, not a correctness issue, as long as each user called ensure_commit_buffer before accessing it). But the first step there would be to audit all of the accesses of commit->buffer to make sure that they call ensure_commit_buffer (presumably they already call parse_commit). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html