On Mon, Sep 09, 2019 at 09:39:41AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > On Fri, Sep 06, 2019 at 09:59:04AM -0700, Junio C Hamano wrote: > > > >> Jeff King <peff@xxxxxxxx> writes: > >> > >> > So far so good. But now imagine we call parse_commit_buffer() again, and > >> > we re-parse. How does that interact with the half-parsed state? Some of > >> > it works OK (e.g., lookup_tree() would find the same tree). Some not so > >> > much (I think we'd keep appending parents at each call). > >> > > >> > I guess this might not be too bad to handle. Value fields like > >> > timestamp_t are OK to overwrite. Pointers to objects likewise, since the > >> > memory is owned elsewhere. If we see existing parent pointers in an > >> > object we're parsing, we could probably free them under the assumption > >> > they're leftover cruft. Likewise for the "tag" field of "struct tag", > >> > which is owned by the struct and should be freed. > >> > >> Yeah, or clear them before returning with .corrupt bit set? > > > > This was my attempt to avoid dealing with a .corrupt bit. :) > > Then, clear them before returning with .parsed bit clear? But then somebody calling parse_commit() and getting an error return doesn't get to see the full extent of what we managed to parse. Which I thought was one of your original points: people should be able to get whatever information we can get out of even a corrupted or malformed object. If we keep that requirement, then we have to either clear them at the start of the re-parsing step, or we have to avoid re-parsing entirely by using an extra bit to say "parsed but had an error". I've sent messy "something like this" patches for both strategies. I think the latter is cleaner conceptually, but the former doesn't require giving up a flag bit. -Peff