On Fri, Jan 22, 2021 at 06:24:59PM -0500, Jeff King wrote: > On Wed, Jan 13, 2021 at 05:28:11PM -0500, Taylor Blau wrote: > > [...] > > qsort? Don't we have a perfectly good radix sort for this exact purpose? :) Yeah, I think your argument against it (which I agree with is basically): We could, and it would be faster, but it would (1) require allocating twice as much data, and (2) the speed increase is a drop in the bucket on any repository with enough objects to even notice it. I'll add something to that effect to the commit message. > Dereferencing a pointer to index another array always makes me nervous > that we may have a bounds problem with bogus data. > > In this case we know it is OK because we filled the array ourselves with > in-bound numbers in write_rev_index_positions. Right. > I was surprised we didn't define these already on the reading side, but > it looks like we didn't. In patch 1, we probably should be checking > RIDX_SIGNATURE in load_revindex_from_disk(). But much more importantly, > we should be checking that we find version 1, since that's what will > make it safe to later invent a version 2. :-). Thanks for noticing, fixed. > I forgot to comment on this in patch 1, but: I think the format is > really independent of the hash size. The contents are identical for a > sha-1 versus sha-256 file. Right; the format is identical. That is, the fields are the same, but the hashes are longer. > That said, I don't overly mind having a hash identifier if it might help > debug things (OTOH, how the heck do you end up with one that matches the > trailer's packfile but _doesn't_ match the trailer's contents?). > > If we do have it, should we also be checking it in the loading function? I'm not opposed to it, but I imagined that this field would be primarily for debugging purposes. I was surprised to learn that we _don't_ verify the checksum for .idx files. So, I'm reluctant to start doing so here, honestly. > So if the caller gave us a name, we force-overwrite it. That seemed > weird to me at first, but it makes sense; the atomic rename-into-place > writers will not be passing in the name. And this is exactly how > write_idx_file() works. Yep. > I wonder if we could factor out some of this repeated logic, but I > suspect it is mostly diminishing returns. Maybe this "open a pack file > for writing" could become a helper function, though. Yeah. I tried factoring it out before replying, and it's a little gross. Most of my discomfort with it lies in the complexity of the parameters. Consider extracting the code in write_idx_file(): - There would need to be a double pointer for 'index_name' (which is sometimes read, and sometimes written to). - There would be an unsigned bit for "verify" (i.e., "open with hashfd_check() or not"). - There would be a "pattern" variable if we were creating a new temporary file with odb_mkstemp(). Having the caller be forced to juggle the combinations of passing NULL/0 or not for each of those three makes me leery that this is worth doing, so I tend to agree with your judgement that this provides a diminishing return. Thanks, Taylor