2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've got a proposal for changing the sparse-checkout command slightly;
but it probably doesn't make sense without the context of the bugs
(old and new) we are facing.  Consider this an RFC, with the final
bullet point particularly in need of comment and ideas.

== Background ==

sparse-checkouts in cone mode are documented as being obtained either
by using the `--sparse` flag to `git clone`, or by running the
sequence:

    git sparse-checkout init --cone [--sparse-index]
    git sparse-checkout set ...

The first step has traditionally deleted all the tracked files from
the working tree, except in the toplevel directory, and the second
restores all the tracked files that are wanted.

(Usage context:)
My understanding is Microsoft never uses this sequence, instead using
the --sparse flag to `git clone`.  In contrast, at Palantir the
--sparse flag to clone is rarely used.

(An aside on pre-existing bugs/warts of the above sequence:)
This has always been bad from a performance point of view (especially
in more extreme not-so-sparse cases when the desired sparsity paths
represent roughly half of all paths), and has been suboptimal from a
UI point of view due to the dual progress meters (other wrappers, such
as an in-house `sparsify`, can have a single user-facing command for
switching to a sparse-checkout that has to call both of these git
commands under the hood; that wrapper would prefer one progress meter
to two).  But it never quite arose to the level of needing to be
fixed.

== The (New) Bug ==

Starting with Git 2.34, each step will delete all ignored files
outside the sparsity paths specified to the individual command in
question.  We are totally onboard with deleting ignored files outside
the sparsity paths the user wants, but the first command is required
according to the documentation and does not allow specifying any
sparsity paths.  Since it does not allow specifying any sparsity
paths, it treats *everything* as outside and essentially deletes all
ignored files everywhere.  That's not workable for us.  We want a
single command for changing to a sparse-checkout.

== The Current Workaround ==

Luckily, having these two commands separate isn't enforced, and the
first command is basically roughly equivalent to setting a few config
variables and then running `sparse-checkout set` with an empty set of
paths.  So, currently, we can just do the config setting part of init
manually, and then skip the rest of init, and then call our desired
`set` command:
    git config extensions.worktreeConfig true
    git config --worktree core.sparseCheckout true
    git config --worktree core.sparseCheckoutCone true
    git sparse-checkout set ...

Since we're using a wrapper anyway (for computing dependencies and
determining the list of directories to include), it was relatively
easy for us to add this workaround.

However, it is not clear that our current workaround will continue
functioning with future versions of git, particularly if
`sparse-checkout init` gains more options.  In fact, it already
doesn't handle --sparse-index.

== Long term proposal ==

Make `set` do both the work of `init` and `set`.

This means:
  * `set` gains the ability to parse both --cone and --sparse-index
(in addition to --stdin, etc.)
  * If the sparse-index is not initialized, `set` does the
initialization work of `init`.
  * Modify the `init` documentation to mark it as deprecated,
mentioning the 2-3 bugs above as reasons why.
  * We could effectively just turn `git sparse-checkout init ...` into
an alias for `git sparse-checkout set ...`, since init's parameters
would be a subset of those that `set` accepts.  However, the latter
might interact badly with allowing a user to toggle sparse-index on
and off in the middle of a sparse-checkout...so maybe we need
something more?  Alternatively, we could leave `init` as-is and just
consider it set in concrete, possibly risking it becoming
non-functional in a future upgrade.  Hmm...


Thoughts?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux