Derrick Stolee wrote: > On 6/29/2022 12:58 PM, Victoria Dye via GitGitGadget wrote: >> From: Victoria Dye <vdye@xxxxxxxxxx> >> >> Replace 'README.md' with 'technical/scalar.txt' (still in 'contrib/'). In >> addition to reformatting for asciidoc, elaborate on the background, purpose, >> and design choices that went into Scalar. >> >> This document is intended to persist in the 'Documentation/technical/' >> directory after Scalar has been moved into the root of Git (out of > > I wonder if it is a good idea to move this into Documentation/technical/ _now_, > so we can have this document as part of "core Git's roadmap" as much as we can > say that. Part of the roadmap is moving the Scalar code out of contrib/ and > finalizing how users (or distributors) can install the 'scalar' binary. > I initially hesitated in doing this because I felt it was a bit premature, but your comment prompted me to look more closely at other docs in 'Documentation/technical'. Precedent seems to indicate that the doc should be added *before* implementation (either in the same series as the implementation [1] or separately [2]), so I'll do the same here. Thanks for the suggestion! [1] https://lore.kernel.org/git/daa9a6bcefbce977606484d502f5cfd7fca63ecc.1614111270.git.gitgitgadget@xxxxxxxxx/ [2] https://lore.kernel.org/git/20171208192636.13678-1-git@xxxxxxxxxxxxxxxxx/ > It can be helpful to include the details of what steps to take to compile and > test the 'scalar' executable. That documentation will then be updated when > Scalar moves out of contrib/. > As part of the move out of 'contrib/', I was planning on having Scalar built and installed the same as any built-in (albeit in 'bin/' - like 'gitk', 'git-cvsserver', etc. - rather than 'libexec/git-core'). In that case, there won't be any special steps needed to build/install 'scalar', so any instructions here would be temporary. I could include those instructions in the meantime, but with Scalar incomplete, I'm not sure whether that would be valuable. All that said, if later reviews/iterations lead to Scalar needing special steps to build/install, I'll definitely include them here. >> +Scalar >> +====== >> + >> +Scalar is a built-in repository management tool that optimizes Git for use in > > I think "built-in" is unnecessary here. It makes me think of Git builtins and > otherwise does not add much to the description. > >> +large repositories. It accomplishes this by helping users to take advantage of >> +advanced performance features in Git. Unlike most other Git built-in commands, >> +Scalar is not executed as a subcommand of 'git'; rather, it is built as a >> +separate executable containing its own series of subcommands. > > Good to have this distinction. > >> +Scalar is a large enough project that it is being upstreamed incrementally, >> +living in `contrib/` until it is feature-complete. So far, the following patch >> +series have been accepted: >> + >> +- `scalar-the-beginning`: The initial patch series which sets up >> + `contrib/scalar/` and populates it with a minimal `scalar` command that >> + demonstrates the fundamental ideas. >> + >> +- `scalar-c-and-C`: The `scalar` command learns about two options that can be >> + specified before the command, `-c <key>=<value>` and `-C <directory>`. >> + >> +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand. >> + >> +Roughly speaking (and subject to change), the following series are needed to >> +"finish" this initial version of Scalar: > > I like how you have these two sections. As you move forward in this roadmap, > the diff to the document should look like you are _moving_ the sentence above. > >> +- Finish Scalar features: Enable the built-in FSMonitor in Scalar >> + enlistments and implement `scalar help`. At the end of this series, Scalar >> + should be feature-complete from the perspective of a user. > > Wow, we're so close! > >> +- Generalize features not specific to Scalar: In the spirit of >> + making Scalar configure only what is needed for large repo performance, move > > (nit: Strange wrapping on the first line) > >> + common utilities into other parts of Git. Some of this will be internal-only, >> + but one major change will be generalizing `scalar diagnose` for use with any >> + Git repository. >> + >> +- Move Scalar to toplevel: Make `scalar` a built-in component of Git by > > Here, I would say "included" instead of built-in. Or is there another term we > should use to avoid confusion with "git <cmd>" builtins. > >> + moving it out of `contrib/` and into the root of `git`. The actual change will >> + be relatively small, but this series will also contain expanded testing to >> + ensure Scalar is stable and performant. > > Perhaps "This change will also integrate Scalar into Git's test suite and CI > checks." > > You mention "performant" which makes me think that performance tests are intended > to be part of this change. It makes me think it would be interesting to have our > existing performance tests create a mode where they compare a "vanilla" Git repo > to one registered with Scalar, but otherwise runs the same tests already in the > t/perf/ test scripts. This is a wide aside so feel free to ignore me. > This is a really interesting idea! My original plan was to add some basic tests around the operations 'scalar' should (directly or indirectly) speed up. I think I'll still need to do that anyway (e.g., for things like 'scalar clone' vs 'git clone'), but I'll also try to find a (repeatable) way to compare standard repo vs. Scalar enlistment performance in the existing perf tests. >> +Finally, there are two additional patch series that exist in Microsoft's fork of >> +Git, but there is no current plan to upstream them. There are some interesting >> +ideas there, but the implementation is too specific to Azure Repos and/or VFS >> +for Git to be of much help in general. >> + >> +These still exist mainly because the GVFS protocol is what Azure Repos has >> +instead of partial clone, while Git is focused on improving partial clone: >> + >> +- `scalar-with-gvfs`: The primary purpose of this patch series is to support >> + existing Scalar users whose repositories are hosted in Azure Repos (which does >> + not support Git's partial clones, but supports its predecessor, the GVFS >> + protocol, which is used by Scalar to emulate the partial clone). >> + >> + Since the GVFS protocol will never be supported by core Git, this patch series >> + will remain in Microsoft's fork of Git. >> + >> +- `run-scalar-functional-tests`: The Scalar project developed a quite >> + comprehensive set of integration tests (or, "Functional Tests"). They are the >> + sole remaining part of the original C#-based Scalar project, and this patch >> + adds a GitHub workflow that runs them all. >> + >> + Since the tests partially depend on features that are only provided in the >> + `scalar-with-gvfs` patch series, this patch cannot be upstreamed. > > We've had a good track record of identifying when an upstream change causes a > failure in the Scalar Functional Tests and then porting that test into the Git test > suite. I expect that to continue, but it has also not happened in a while. This is > probably because the test coverage now has a significant amount of overlap. Many > of the tests were created specifically for VFS for Git and strangeness around the > virtual filesystem layer. Several of those have found interesting corner cases in > Git's sparse-checkout feature, but these have probably been resolved by now. > > Thanks for doing this work, Victoria. I'm excited to see how the project progresses > with your new plan. > > Thanks, > -Stolee