Re: [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits

Jeff King <peff@xxxxxxxx> · Mon, 9 Sep 2019 12:54:20 -0400

On Mon, Sep 09, 2019 at 09:39:41AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > On Fri, Sep 06, 2019 at 09:59:04AM -0700, Junio C Hamano wrote:
> >
> >> Jeff King <peff@xxxxxxxx> writes:
> >> 
> >> > So far so good. But now imagine we call parse_commit_buffer() again, and
> >> > we re-parse. How does that interact with the half-parsed state? Some of
> >> > it works OK (e.g., lookup_tree() would find the same tree). Some not so
> >> > much (I think we'd keep appending parents at each call).
> >> >
> >> > I guess this might not be too bad to handle. Value fields like
> >> > timestamp_t are OK to overwrite. Pointers to objects likewise, since the
> >> > memory is owned elsewhere. If we see existing parent pointers in an
> >> > object we're parsing, we could probably free them under the assumption
> >> > they're leftover cruft. Likewise for the "tag" field of "struct tag",
> >> > which is owned by the struct and should be freed.
> >> 
> >> Yeah, or clear them before returning with .corrupt bit set?
> >
> > This was my attempt to avoid dealing with a .corrupt bit. :)
> 
> Then, clear them before returning with .parsed bit clear?

But then somebody calling parse_commit() and getting an error return
doesn't get to see the full extent of what we managed to parse. Which I
thought was one of your original points: people should be able to get
whatever information we can get out of even a corrupted or malformed
object.

If we keep that requirement, then we have to either clear them at the
start of the re-parsing step, or we have to avoid re-parsing entirely by
using an extra bit to say "parsed but had an error".  I've sent messy
"something like this" patches for both strategies.

I think the latter is cleaner conceptually, but the former doesn't
require giving up a flag bit.

-Peff