On Fri, Sep 10, 2021 at 10:00:26PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > if (ends_with(file_name, ".idx")) { > > display_progress(ctx->progress, ++ctx->pack_paths_checked); > > - if (ctx->m && midx_contains_pack(ctx->m, file_name)) > > - return; > > + if (ctx->m) { > > + if (midx_contains_pack(ctx->m, file_name)) > > + return; > > + } else if (ctx->to_include) { > > + if (!string_list_has_string(ctx->to_include, file_name)) > > + return; > > What's the expected number of elements on the to_include list? I am > wondering about the performance implications of using linear search > over the string-list, of course. Is it about the same order of the > number of packfiles in a repository (up to several dozens, or 1000 > at most unless you are insane, or something like that)? You're definitely in the right ballpark. It depends on the repack settings and size of repository, of course, but I imagine that roughly 1,000 entries would be the most anybody could ever pass (e.g., during a `--geometric` repack, the biggest pack would have to contain 2^1000 times as many objects as the smallest pack). Of course, you could just constantly be adding packs and doing incremental `git repack -d --write-midx`. Seems unlikely to me, but if it does become a problem we could easily read the values into a hashmap and constant-ize the lookup. But the scan is logarithmic, not linear, since the string list is sorted. Thanks, Taylor