Re: [PATCH v1] read_index_from(): Skip verification of the cache entry order to speed index loading

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 19 Oct 2017 14:22:13 +0900

Ben Peart <benpeart@xxxxxxxxxxxxx> writes:

> There is code in post_read_index_from() to catch out of order entries
> when reading an index file.  This order verification is ~13% of the cost
> of every call to read_index_from().

I find this a bit over-generalized claim---wouldn't the overhead
depend on various conditions, e.g. the size of the index and if
split-index is in effect?

In general, I get very skeptical towards any change that makes the
integrity of the data less certain based only on microbenchmarks,
and prefer to see a solution that can absorb the overhead in some
other way.

When we are using split-index, the current code is not validating
the two input files from the disk. Because merge_base_index()
depends on the base to be properly sorted before the overriding
entries are added into it, if the input from disk is in a wrong
order, we are screwed already, and the order check in post
processing is pointless.  If we want to do this order validation, I
think we should be doing it in do_read_index() where it does
create_from_disk() and the set_index_entry(), instead of having it
as a separate phase that scans a potentially large index array one
more time.  And doing so will not penalize the case where we do not
use split-index, either.

So, I think I like the direction of getting rid of the order
validation in post_read_index_from(), not only during the normal
operation but also in fsck.  I think it makes more sense to do so
incrementally inside do_read_index() all the time and see how fast
we can make it do so.