Re: Auto packing the repository - foreground or background in Windows?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/8/2022 9:52 AM, Tao Klerks wrote:
> On Tue, Dec 6, 2022 at 7:03 PM Derrick Stolee <derrickstolee@xxxxxxxxxx> wrote:
>>
>> Instead, the modern recommendation for repositories where "git gc --auto"
>> would be slow is to run "git maintenance start" which will schedule
>> background maintenance jobs with the Windows scheduler. Those processes
>> are built to do updates that are non-invasive to concurrent foreground
>> processes. It also sets config to avoid "git gc --auto" commands at the
>> end of foreground Git processes.
>>
>> See [1] for more details.
>>
>> [1] https://git-scm.com/docs/git-maintenance
>>
> 
> Thanks Stolee, I've known about the existence of this system for a
> while, but I can't quite figure out what's recommended for who, when,
> given the doc at https://git-scm.com/docs/git-maintenance

Thanks for the feedback that this document could use a clearer
high-level description for recommended ways to use the command, and
_when_.

One goal when creating the documentation was to _not_ recommend a
specific use pattern, instead focusing on the many ways a user could
customize their maintenance patterns. Perhaps the feature has
stabilized enough (and shown its benefits) that we could add a
recommended use section.
 
> Clearly on Windows, one reason to do "git maintenance start" is to
> avoid foregrounded "git gc --auto" runs later. That's a clear enough
> benefit to say "frequent users of large repos on windows *should* run
> 'git maintenance start' (or have some setup process or GUI do it for
> them) on those large repos".
> 
> Is there a corresponding tangible benefit on MacOS and/or Linux, over
> simply getting "git gc --auto" do its backgrounded thing when it feels
> like it? Or is there an eventual plan to *switch* from the current
> "git gc --auto" spawning to a "git maintenance start" execution when
> trigger conditions are met? Are there any *dis*advantages to running
> "git maintenance start" in general or on any given platform?

For large repositories, the default 'git gc --auto' takes a lot of
resources to rewrite all object data into a single pack-file. The
background maintenance does smaller, incremental repacks. Here,
"large" means "more than 2GB of packed object data", since that's
the default limit for the incremental repacks starting a new pack.

There's other benefits where it does hourly prefetches, getting
object data from remotes before the user requests a ref update
through 'git fetch' or 'git pull'. Those foreground operations
speed up, as well.

> For "my users", I have something like Scalar that can start the
> maintenance on the repo where it's needed - but it seems like there
> will be lots of users out there in the world who clone things like the
> linux repo, which looks like it is big enough to warrant these kinds
> of concerns, but it doesn't seem obvious that anyone will ever find
> "https://git-scm.com/docs/git-maintenance"; and decide to run "git
> maintenance start" on their own...

We do what we can to advertise these kinds of features, but at some
point users need to self-discover things. But that's also a motivation
for the Scalar command: the user can relax some control to allow the
Scalar command to choose those recommended settings on behalf of the
user.

> As I noted in another email, I propose to replace "Auto packing the
> repository for optimum performance" with something like "Auto packing
> the repository for optimum performance; to run this kind of
> maintenance in the background, see 'git maintenance' at
> https://git-scm.com/docs/git-maintenance."; - but I imagine I'm missing
> a bigger picture / a long-term plan for how these two mechanisms
> should interact.

A message that points out 'git maintenance' like this might work best
as part of the "advice" API, so those who don't want to see the
message every time could disable it.

> My apologies if I've missed one or many conversations about this on
> the list, but maybe a pointer here can also help me add directional
> hints at https://git-scm.com/docs/git-maintenance for "outside users"?

I'm trying to think of a builtin whose documentation has such strong
"recommended use" language.

The best I could think about are commands with substantial "examples"
sections, such as 'git bundle'.

A more radical approach would be to create a new doc type that
provides recommendations for how to manage large repositories. I
imagine it would be sorted in order of increasing complexity,
something like:

 1. Use 'scalar' and see if it works for your needs.

 2. Self-serve with 'git maintenance start', 'git sparse-checkout',
    partial clone, and feature.manyFiles=true as needed.

 3. Go deep on individual plumbing commands and config options
    that provide knobs to tweak how Git manages information.

I think starting with some examples or a "recommended use" section
for 'git maintenance' would be a better first step.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux