On 9/5/2019 2:47 AM, Jeff King wrote: > On Wed, Sep 04, 2019 at 05:18:47PM -0400, Taylor Blau wrote: > >> All of this makes sense to me, so I'm wondering what part(s) of this you >> feel are worth addressing in this first patch series. Presumably, there >> is a longer series that we _could_ write which would introduce a new >> 'corrupt' field and then check for it here. >> >> But, I'm hesitant to write those patches, since I only have this one >> call-site in mind. If we introduce 'corrupt', I feel it would be best to >> use it uniformly, instead of only checking it here, and relying on other >> bespoke mechanisms to detect corruption elsewhere. >> >> So, I'm content to write the pseudo-code you provided above (which is to >> say, call and check both 'parse_commit_no_graph', _and_ >> 'get_commit_tree_oid'), because I think that it's expedient, and fix the >> issue which I'm pointing out here. > > I'd actually be willing to just take the patch you have here, and > consider the "parsed but we saw an error" thing as an oddity of the > object code. IOW, we shouldn't _have_ to be double-checking here. > Looking for an error return from parse_commit() should really be all a > caller needs to do. Once that's fixed, then your code would just be > doing the right thing. > > That said, there's another unhandled case, I think: lookup_tree() might > return NULL (if somebody previously saw that oid as a non-tree), and > parse_commit() wouldn't even notice and return an error! > > IMHO that's also something that parse_commit() should be returning an > error for. And it's probably a lot easier to trigger versus the "parsed > earlier but corrupted" thing. > > So it might be worth doing the NULL tree check here in the meantime. I > dunno. > > Below is a sketch of what I'm thinking parse_commit() should do: > > - remember when an earlier parse returned an error, so we can repeat > that error (this requires some unfortunate bit-field adjusting) > > - notice a lookup_tree failure > > - likewise, notice a lookup_parent failure > > --- > diff --git a/commit.c b/commit.c > index a98de16e3d..7e415932b7 100644 > --- a/commit.c > +++ b/commit.c > @@ -391,7 +391,9 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep) > return ret; > } > > -int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph) > +static int parse_commit_buffer_1(struct repository *r, struct commit *item, > + const void *buffer, unsigned long size, > + int check_graph) > { > const char *tail = buffer; > const char *bufptr = buffer; > @@ -401,9 +403,6 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b > const int tree_entry_len = the_hash_algo->hexsz + 5; > const int parent_entry_len = the_hash_algo->hexsz + 7; > > - if (item->object.parsed) > - return 0; > - item->object.parsed = 1; > tail += size; > if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) || > bufptr[tree_entry_len] != '\n') > @@ -412,6 +411,10 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b > return error("bad tree pointer in commit %s", > oid_to_hex(&item->object.oid)); > set_commit_tree(item, lookup_tree(r, &parent)); > + if (!item->maybe_tree) > + return error("bad tree pointer %s in commit %s", > + oid_to_hex(&parent), > + oid_to_hex(&item->object.oid)); > bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */ > pptr = &item->parents; > > @@ -431,15 +434,19 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b > if (graft && (graft->nr_parent < 0 || grafts_replace_parents)) > continue; > new_parent = lookup_commit(r, &parent); > - if (new_parent) > - pptr = &commit_list_insert(new_parent, pptr)->next; > + if (!new_parent) > + return error("bad parent %s in commit %s", > + oid_to_hex(&parent), > + oid_to_hex(&item->object.oid)); > + pptr = &commit_list_insert(new_parent, pptr)->next; > } > if (graft) { > int i; > struct commit *new_parent; > for (i = 0; i < graft->nr_parent; i++) { > new_parent = lookup_commit(r, > &graft->parent[i]); > + /* Here we ignore bogus grafts. Also should be an error? */ > if (!new_parent) > continue; > pptr = &commit_list_insert(new_parent, pptr)->next; > @@ -453,6 +460,23 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b > return 0; > } > > +int parse_commit_buffer(struct repository *r, struct commit *item, > + const void *buffer, unsigned long size, > + int check_graph) > +{ > + int ret; > + > + if (item->object.parsed) > + return item->object.corrupt ? -1 : 0; > + item->object.parsed = 1; > + > + ret = parse_commit_buffer_1(r, item, buffer, size, check_graph); > + if (ret < 0) > + item->object.corrupt = 1; > + > + return ret; > +} > + > int repo_parse_commit_internal(struct repository *r, > struct commit *item, > int quiet_on_missing, > diff --git a/object.h b/object.h > index 0120892bbd..b83d3964ad 100644 > --- a/object.h > +++ b/object.h > @@ -59,7 +59,7 @@ struct object_array { > > /* > * object flag allocation: > - * revision.h: 0---------10 25----28 > + * revision.h: 0---------10 24----27 > * fetch-pack.c: 01 > * negotiator/default.c: 2--5 > * walker.c: 0-2 > @@ -78,13 +78,14 @@ struct object_array { > * builtin/show-branch.c: 0-------------------------------------------26 > * builtin/unpack-objects.c: 2021 > */ > -#define FLAG_BITS 29 > +#define FLAG_BITS 28 > > /* > * The object type is stored in 3 bits. > */ > struct object { > unsigned parsed : 1; > + unsigned corrupt : 1; > unsigned type : TYPE_BITS; > unsigned flags : FLAG_BITS; > struct object_id oid; > diff --git a/revision.h b/revision.h > index 4134dc6029..5c0b831b37 100644 > --- a/revision.h > +++ b/revision.h > @@ -33,7 +33,7 @@ > #define ALL_REV_FLAGS (((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR) > > #define TOPO_WALK_EXPLORED (1u<<27) > -#define TOPO_WALK_INDEGREE (1u<<28) > +#define TOPO_WALK_INDEGREE (1u<<24) As an aside, these flag bit modifications look fine, but would need to be explained. I'm guessing that since you are adding a bit of data to struct object you want to avoid increasing the struct size across a 32-bit boundary. Are we sure that bit 24 is not used anywhere else? (My search for "1u<<24" found nothing, and "1 << 24" found a bit in the cache-entry flags, so this seems safe.) Thanks, -Stolee