[Adding Jeff King to CC; I meant to copy you in the original but forgot, sorry] On 2013-01-23, at 14:19, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jonathon Mah <jmah@xxxxxx> writes: > >> Add a new function "free_object_buffer", which marks the object as >> un-parsed and frees the buffer. Only trees and commits have buffers; >> other types are not affected. If the tree or commit buffer is already >> NULL, the "parsed" flag is still cleared so callers can control the free >> themselves (index-pack.c uses this). >> >> Several areas of code would free buffers for object structs that >> contained them ("struct tree" and "struct commit"), but without clearing >> the "parsed" flag. parse_object would clear the flag for "struct tree", >> but commits would remain in an invalid state (marked as parsed but with >> a NULL buffer). Because the objects are uniqued (ccdc6037fee), the >> invalid objects stay around and can lead to bad behavior. >> >> In particular, running pickaxe on all refs in a repository with a cached >> textconv could segfault. If the textconv cache ref was evaluated first >> by cmd_log_walk, a subsequent notes_cache_match_validity call would >> dereference NULL. > > Conceptually this is a right thing to do, but it is unclear why this > conversion is safe in the existing code. > > A codepath that used to free() and assign NULL to a buffer without > resetting .parsed would have assumed that it can find out the parsed > properties of the object (e.g. .parents) without re-parsing the > object, and much more importantly, the modifications made by that > codepath will not be clobbered by later call to parse_object(). > > IIRC, revision traversal machinery rewrites commit->parents but > discards buffer when it knows that the log message is not needed > (save_commit_buffer controls this behaviour). I do not offhand know > if there are other instances of existing code that depend on the > current behaviour, but have you audited all the codepaths that are > affected by this patch and codepaths that work on objects this patch > unmarks their .parsed field will not have such a problem? No, I haven't audited the code paths (I have only the loosest familiarity with the source). Indeed, I found that clearing the 'parsed' flag in fsck.c (traverse_one_object()) is incorrect and causes test failures. With the object cache, isn't modifying the object unsafe in general? Instead of auditing code paths, it's now necessary to audit _all_ code that uses "struct object", which seems infeasible. Anyway, I don't care about the implementation (Junio does that extremely well), I'm just trying to fix the segfault demonstrated in the test attached to the patch. Jonathon Mah me@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html