Re: [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/27/24 8:51 PM, Elijah Newren wrote:
On Thu, Jun 27, 2024 at 1:59 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:

Both original and the rewritten code shares one trait, which is that
it goes from the beginning to check all paths that are marked with
SKIP_WORKTREE bit to clear CE_SKIP_WORKTREE bits from them, until
they find a S_ISSPARSEDIR entry and lazily call ensure_full_index(),
but then when retrying after calling ensure_full_index(), it
restarts from the beginning.  I wonder if it would help, especially
if the S_ISSPARSEDIR entry comes very late in the index (e.g., by

"the S_ISSPARSEDIR entry"?  Are you assuming there is only one?

In fact, the situation is more drastic when there are many sparse
directories and this is one late in the list. The ensure_full_index()
call could cause many new entries that should be no-ops earlier in
the lex order.

returning "here is where we have already checked during the first
run, until we realized that we first need to do ensure_full_index()"
to the caller from here, and then the caller tells the second phase
to restart from there), to reduce the number of calls to path_found()?

My first assumption about how to "restart from there", was to pass the
current index, i, to the more involved check.  That might fail to be
the correct index when there are multiple S_ISSPARSEDIR entries and
the first one (or more) are actually missing from the working copy as
expected.  However, if "restart from there" is just passing the
name of the relevant directory and we do some kind of binary search to
find where the earliest entry under that directory would exist in the
expanded index, then that seems like a reasonable additional
optimization.

Yes, you're on the right track. Requires replacing the response from
the sparse loop to be the existing cache entry position or path
instead of only a boolean "should restart" indicator.

Another option would be to explore the whole index for the sparse
directories that exist, then expand the index only to those paths.
That should be fastest, but would require using the partially-
expanded state that has so far only happened inside of the
sparse-checkout builtin.

All this to say: there are some interesting directions to go from
here. I'm willing to take some time to explore the options. Should
we consider merging this improvement now and then consider further
improvements in a follow-up series (if they are fruitful)?

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux