On 3/22/2019 1:37 AM, Junio C Hamano wrote:
"Jeff Hostetler via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
Teach "multi-pack-index verify" to handle cases where the number of
packfiles exceeds the open file handle limit.
The first commit fixes a problem that prevented the LRU-style
close_one_pack() mechanism from working which caused midx verify to run out
of file descriptors.
The second commit teaches midx verify to sort the set of objects to verify
by packfile rather than verifying them in OID order. This eliminates the
need to have more than one packfile/idx open at the same time.
With the second commit, runtime on 3600 packfiles went from 12 minutes to 25
seconds.
These reference to the first and second commit might have become
stale across interations, but logically it makes sense---the first
point is about correctness (i.e. do not die by running out of fds)
and the second one is about usable-performance.
But in this round (possibly in the previous one, too?) the "group
objects by packfile" one addresses both points?
Sorry, I forgot to remote the stale content in the cover letter for
the V3 version.
This version just has the sorting by packfile commit and because it only
keeps 1 packfile open at a time, it does not need the change to add
packfiles to the packed-git list because it does not trigger the
close_one_pack() problem.
We suspect there are other places (not-yet-observed) where the design of
the all-packs and packed-git lists will lead to similar fd exhaustion
errors and want to fix it properly in the packfile and/or midx code.
We'll address this potential problem in a future patch series.
Jeff