Re: [PATCH 2/2] scalar: convert README.md into a technical design doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Derrick Stolee wrote:
> On 6/29/2022 12:58 PM, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@xxxxxxxxxx>
>>
>> Replace 'README.md' with 'technical/scalar.txt' (still in 'contrib/'). In
>> addition to reformatting for asciidoc, elaborate on the background, purpose,
>> and design choices that went into Scalar.
>>
>> This document is intended to persist in the 'Documentation/technical/'
>> directory after Scalar has been moved into the root of Git (out of
> 
> I wonder if it is a good idea to move this into Documentation/technical/ _now_,
> so we can have this document as part of "core Git's roadmap" as much as we can
> say that. Part of the roadmap is moving the Scalar code out of contrib/ and
> finalizing how users (or distributors) can install the 'scalar' binary.
> 

I initially hesitated in doing this because I felt it was a bit premature,
but your comment prompted me to look more closely at other docs in
'Documentation/technical'. Precedent seems to indicate that the doc should
be added *before* implementation (either in the same series as the
implementation [1] or separately [2]), so I'll do the same here. Thanks for
the suggestion!

[1] https://lore.kernel.org/git/daa9a6bcefbce977606484d502f5cfd7fca63ecc.1614111270.git.gitgitgadget@xxxxxxxxx/
[2] https://lore.kernel.org/git/20171208192636.13678-1-git@xxxxxxxxxxxxxxxxx/

> It can be helpful to include the details of what steps to take to compile and
> test the 'scalar' executable. That documentation will then be updated when
> Scalar moves out of contrib/.
> 

As part of the move out of 'contrib/', I was planning on having Scalar built
and installed the same as any built-in (albeit in 'bin/' - like 'gitk',
'git-cvsserver', etc. - rather than 'libexec/git-core'). In that case, there
won't be any special steps needed to build/install 'scalar', so any
instructions here would be temporary. I could include those instructions in
the meantime, but with Scalar incomplete, I'm not sure whether that would be
valuable.

All that said, if later reviews/iterations lead to Scalar needing special
steps to build/install, I'll definitely include them here.

>> +Scalar
>> +======
>> +
>> +Scalar is a built-in repository management tool that optimizes Git for use in
> 
> I think "built-in" is unnecessary here. It makes me think of Git builtins and
> otherwise does not add much to the description.
> 
>> +large repositories. It accomplishes this by helping users to take advantage of
>> +advanced performance features in Git. Unlike most other Git built-in commands,
>> +Scalar is not executed as a subcommand of 'git'; rather, it is built as a
>> +separate executable containing its own series of subcommands.
> 
> Good to have this distinction.
> 
>> +Scalar is a large enough project that it is being upstreamed incrementally,
>> +living in `contrib/` until it is feature-complete. So far, the following patch
>> +series have been accepted:
>> +
>> +- `scalar-the-beginning`: The initial patch series which sets up
>> +  `contrib/scalar/` and populates it with a minimal `scalar` command that
>> +  demonstrates the fundamental ideas.
>> +
>> +- `scalar-c-and-C`: The `scalar` command learns about two options that can be
>> +  specified before the command, `-c <key>=<value>` and `-C <directory>`.
>> +
>> +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
>> +
>> +Roughly speaking (and subject to change), the following series are needed to
>> +"finish" this initial version of Scalar:
> 
> I like how you have these two sections. As you move forward in this roadmap,
> the diff to the document should look like you are _moving_ the sentence above.
> 
>> +- Finish Scalar features: Enable the built-in FSMonitor in Scalar
>> +  enlistments and implement `scalar help`. At the end of this series, Scalar
>> +  should be feature-complete from the perspective of a user.
> 
> Wow, we're so close!
> 
>> +- Generalize features not specific to Scalar: In the spirit of
>> +  making Scalar configure only what is needed for large repo performance, move
> 
> (nit: Strange wrapping on the first line)
> 
>> +  common utilities into other parts of Git. Some of this will be internal-only,
>> +  but one major change will be generalizing `scalar diagnose` for use with any
>> +  Git repository.
>> +
>> +- Move Scalar to toplevel: Make `scalar` a built-in component of Git by
> 
> Here, I would say "included" instead of built-in. Or is there another term we
> should use to avoid confusion with "git <cmd>" builtins.
> 
>> +  moving it out of `contrib/` and into the root of `git`. The actual change will
>> +  be relatively small, but this series will also contain expanded testing to
>> +  ensure Scalar is stable and performant.
> 
> Perhaps "This change will also integrate Scalar into Git's test suite and CI
> checks."
> 
> You mention "performant" which makes me think that performance tests are intended
> to be part of this change. It makes me think it would be interesting to have our
> existing performance tests create a mode where they compare a "vanilla" Git repo
> to one registered with Scalar, but otherwise runs the same tests already in the
> t/perf/ test scripts. This is a wide aside so feel free to ignore me.
> 

This is a really interesting idea! My original plan was to add some basic
tests around the operations 'scalar' should (directly or indirectly) speed
up. I think I'll still need to do that anyway (e.g., for things like 'scalar
clone' vs 'git clone'), but I'll also try to find a (repeatable) way to
compare standard repo vs. Scalar enlistment performance in the existing perf
tests.

>> +Finally, there are two additional patch series that exist in Microsoft's fork of
>> +Git, but there is no current plan to upstream them. There are some interesting
>> +ideas there, but the implementation is too specific to Azure Repos and/or VFS
>> +for Git to be of much help in general.
>> +
>> +These still exist mainly because the GVFS protocol is what Azure Repos has
>> +instead of partial clone, while Git is focused on improving partial clone:
>> +
>> +- `scalar-with-gvfs`: The primary purpose of this patch series is to support
>> +  existing Scalar users whose repositories are hosted in Azure Repos (which does
>> +  not support Git's partial clones, but supports its predecessor, the GVFS
>> +  protocol, which is used by Scalar to emulate the partial clone).
>> +
>> +  Since the GVFS protocol will never be supported by core Git, this patch series
>> +  will remain in Microsoft's fork of Git.
>> +
>> +- `run-scalar-functional-tests`: The Scalar project developed a quite
>> +  comprehensive set of integration tests (or, "Functional Tests"). They are the
>> +  sole remaining part of the original C#-based Scalar project, and this patch
>> +  adds a GitHub workflow that runs them all.
>> +
>> +  Since the tests partially depend on features that are only provided in the
>> +  `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
> 
> We've had a good track record of identifying when an upstream change causes a
> failure in the Scalar Functional Tests and then porting that test into the Git test
> suite. I expect that to continue, but it has also not happened in a while. This is
> probably because the test coverage now has a significant amount of overlap. Many
> of the tests were created specifically for VFS for Git and strangeness around the
> virtual filesystem layer. Several of those have found interesting corner cases in
> Git's sparse-checkout feature, but these have probably been resolved by now.
> 
> Thanks for doing this work, Victoria. I'm excited to see how the project progresses
> with your new plan.
> 
> Thanks,
> -Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux