Jeff King <peff@xxxxxxxx> wrote: > On Thu, May 26, 2016 at 08:02:36AM +0000, Eric Wong wrote: > > > > That's probably OK in practice, as I think the actual pack-write already > > > does a linear walk of the object table to generate the pack index. So > > > presumably nobody checkpoints often enough for it to be a problem. And > > > the solution would be to keep a separate list of pointers to objects for > > > the current pack-id, which would trivially fix both this case and the > > > one in create_index(). So we can punt on it until somebody actually > > > runs into it, I think. > > > > I might checkpoint that much and run into the problem soon :) > > Too tired now; maybe in a day or two I'll be able to make sense > > of C again :x OK, I guess I was too tired and not thinking clearly. I now figure it's not worth it to checkpoint frequently and have a very-long-lived fast-import process. The object table and marks set will grow indefinitely, so forking off another gfi with a new set of marks is better for my case[1], anyways. Junio: I think my RFC can go into pu as-is and replace Jeff's object_table discarding patch. [1] inotify watching a Maildir + fast-import > In theory the list of objects in the allocation pool are contiguous for > a particular pack. So you could break out of the double-loop at the > first non-matching object you see: > > diff --git a/fast-import.c b/fast-import.c > index 83558dc..f214bd3 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -900,10 +900,13 @@ static const char *create_index(void) > c = idx; > for (o = blocks; o; o = o->next_pool) > for (e = o->next_free; e-- != o->entries;) > if (pack_id == e->pack_id) > *c++ = &e->idx; > + else > + goto done; > +done: > last = idx + object_count; > if (c != last) > die("internal consistency error creating the index"); > > tmpfile = write_idx_file(NULL, idx, object_count, &pack_idx_opts, pack_data->sha1); > > But that seems to break some of the tests, so I think there is some case > which does not match my theory. :) Yes, it's probably because duplicate objects from the object store will create entries with e->pack_id == MAX_PACK_ID > Another strategy is to note that we cross-check against object_count > here. So you could simply break out of the loop when we have found > object_count matches. I'm worried that could fail to detect bugs which should've led us to die of the internal consistency error. What we could probably do more safely is limit the scan to only recent object pools (new ones since the previous create_index call and the last one scanned). This passes tests, but I could be missing something: --- a/fast-import.c +++ b/fast-import.c @@ -926,14 +926,16 @@ static const char *create_index(void) struct pack_idx_entry **idx, **c, **last; struct object_entry *e; struct object_entry_pool *o; + static struct object_entry_pool *last_seen; /* Build the table of object IDs. */ ALLOC_ARRAY(idx, object_count); c = idx; - for (o = blocks; o; o = o->next_pool) + for (o = blocks; o; o = (o == last_seen) ? NULL : o->next_pool) for (e = o->next_free; e-- != o->entries;) if (pack_id == e->pack_id) *c++ = &e->idx; + last_seen = blocks; last = idx + object_count; if (c != last) die("internal consistency error creating the index"); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html