I am not sure if these ideas are feasible. Elijah Newren <newren@xxxxxxxxx> 于2022年9月28日周三 13:38写道: > > > > +People might also end up wanting behavior B due to complex inter-project > > > +dependencies. The initial attempts to use sparse-checkouts usually > > > +involve the directories you are directly interested in plus what those > > > +directories depend upon within your repository. But there's a monkey > > > +wrench here: if you have integration tests, they invert the hierarchy: > > > +to run integration tests, you need not only what you are interested in > > > +and its dependencies, you also need everything that depends upon what > > > +you are interested in or that depends upon one of your > > > +dependencies...AND you need all the dependencies of that expanded group. > > > +That can easily change your sparse-checkout into a nearly dense one. > > > > In my experience, the downstream dependencies are checked via builds in > > the cloud, though that doesn't help if they are source dependencies and > > you make a breaking change to an API interface. This kind of problem is > > absolutely one of system architecture and I don't know what Git can do > > other than to acknowledge it and recommend good patterns. > > I was talking about (source) dependencies between > modules/projects/whatever-you-want-to-call-the-subcomponents of your > repository. We have hundreds of modules, with various cross-module > dependencies that evolve over time. > > I get the feeling from your description that your intra-repository > dependencies between modules/projects/whatever are much more static > for you than what we deal with. (Which is a good thing; it'd be nice > if ours were more static.) > > > In a properly-organized project, 95% of engineers in the project can have > > a small sparse-checkout, then 5% work on the common core that has these > > downstream dependencies and require a large sparse-checkout definition. > > "In a properly-organized project"? I'm unsure if this is an > indictment of some of the repositories I deal with in reality (and to > be fair, it might be a totally fair indictment), or if your statement > is starting to cross into "No true scotsman" territory. ;-) > > I would probably lean towards the former (we know it's more messy than > it should be), but I'm a bit puzzled that you'd just brush aside my > mention of integration tests. We have people who want to run > integration tests locally, even when only modifying a small area of > the codebase. These users are not doing cross-tree work, rather they > are doing cross-tree testing in conjunction with their work. Running > such tests requires a build of the modules across the repository, > which naively would push folks into a dense checkout...and really long > local builds. We want fast local builds, and sparse-checkouts help us > achieve that...but it does mean we have to be clever about how we > build in order to let these users run integration tests. (And we have > to make it easy for users to discover the relevant integration tests, > and sometimes associated code components that depend on what they are > changing, which is where behavior B comes in). > > > There's nothing Git can do to help those engineers that do cross-tree > > work. > > I'm going to partially disagree with this, in part because of our > experience with many inter-module dependencies that evolve over time. > Folks can start on a certain module and begin refactoring. Being > aware that their changes will affect other areas of the code, the can > do a search (e.g. "git grep --cached ..." to find cases outside their > current sparse checkout), and then selectively unsparsify to get the > relevant few dozen (or maybe even few hundred) modules added. They > aren't switching to a dense checkout, just a less sparse one. When > they are done, they may narrow their sparse specification again. We > have a number of users doing cross-tree work who are using > sparse-checkouts, and who find it productive and say it still speeds > up their local build/test cycles. > > So, I'd say that ensuring Git supports behavior B well in > sparse-checkouts, is something Git can do to help out both some of the > engineers doing cross-tree work, and some of the engineers that are > doing cross-tree testing. > > (For full disclosure, we also have users doing cross-tree work using > regular dense checkouts and I agree there's not a lot we can do to > help them.) > Let me guess where the cross tree users using sparse-checkout are getting their revenue from: 1. they don't have to download the entire repository of blobs at once 2. their working tree can be easily resized. 3. they could have something like sparse-index to optimize the performance of git commands. But it's still worth worrying about the size of the git repository blobs, even if it's just only blobs in mono-repo's HEAD, that may also be too big for the user's local area to handle. Perhaps it would make more sense to place this integration testing work on a remote server. I am not sure if these ideas are feasible: 1. mount the large git repo on the server to local. 2. just ssh to a remote server to run integration tests. 3. use an external tool to run integration tests on the remote server. > > Anyway, we do not want the behavior of `--restrict` for these > commands. That would imply not providing conflicts to users for them > to resolve unless they are contained within the sparse specification, > which would clearly be broken. We instead chose to write out files > with conflicts regardless of whether they are outside the sparse > specification. This modified behavior I gave the name of > `--restrict-unless-conflict`, but we don't need or want an actual > command line flag for that. I think the behavior should just remain > hardcoded into these commands. > > (Note: these commands are among those that make me think > --[no-]restrict or --[un]focus or whatever might not make sense as a > git global option: `--restrict-unless-conflict` behavior is the > default for these and in fact that only sensible option, I think. If > there's only one sensible option, no actual flag names are needed.) > > > The only thing I can think about is that the diffstat might want to show > > the stats for the conflicted files, in which case that's an important > > perspective on the distinction from --restrict. > > We only show the diffstat on a successful merge, so there's no > diffstat to show if there are any conflicted files. > Sorry, I have some questions here: how does git merge know there are no conflicts without downloading the blobs? > > Perhaps something like "scope" would describe the set of things we care > > about, but use a text mode: > > > > --scope=sparse (--restrict) > > --scope=all (--no-restrict) > > > > But I'm notoriously bad at naming things. > > Yeah, me too. Naming things is one of the two hard problems in > computer science, right? (The others being cache invalidation, and > off-by-one errors.) > > However, in this case, your suggestion sounds pretty decent to me. > I'll add it to the list for us to consider. > Agree. Thanks, -- ZheNing Hu