Hi Son, On Wed, Jun 17, 2020 at 10:58 AM Son Luong Ngoc <sluongng@xxxxxxxxx> wrote: > > Hi Elijah, > > On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote: > > > > An aside, though, since you linked to the in-tree sparse-checkout > > definitions: When I reviewed that series, the possibility of merge > > conflicts and not knowing what sparse-checkout should have checked out > > when the in-tree defintions themselves were in a conflicted state > > seemed to me to be a pretty tough sticking point. I'm hoping someone > > has a clever solution, but I still don't yet. Do you? > > I am no clever person, but I often take great pleasure in reading up > works of smarter people. One of which is the Google's and Facebook's Mercurial > extension sets that they opensourced a while ago to support large repos. > > The test suite for FB's 'sparse' extension[1] may address your concerns? > > The 'sparse' extension defines the sparse checkout definition of a > working repository. It supports '--enable-profile' which take in definition > files ('.sparse'). These profiles are often checked into the root dir > of the repo. > > [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115 Ooh, interesting; thanks for the link. It provides an idea, though I'm not completely sure how it maps to our implementation. The test file says that during a merge you get "unioned files". It's not fully clear what union means, especially when the files have both includes and excludes. For example, does the union of matches mean a union of includes and an intersection of excludes? Also, digging a bit further, it appears mercurial requires all includes to be before all excludes[2]. But git's pattern specification used in .git/info/sparse-checkout (taken from .gitignore rules) allows includes and excludes to be arbitrarily interspersed, so what is an appropriate union in our case? (Can we sidestep this question by limiting the in-tree sparsity definitions to cone mode only, which then only have includes in the form of directory names, since that'd allow easy "unioning"?) A little more digging suggests that mercurial also only allows sparse definitions to be read from commits, not from the working tree[3]. That seems bad to me; it's too much of a pain for users who want to edit and test changes. Sure, if their first commit is bad they could `git commit --amend` after the fact, but I don't like forcing them through that workflow. (This is perhaps especially true if they're trying to fix the definition during a rebase; they shouldn't have to commit first to get a corrected sparsity definition, especially as that can easily mess up rebase state.) However, although I don't like reading sparsity definition from commits rather than the working tree, it probably did have an advantage in that it made it easier for mercurial folks to notice the union idea: since they only get sparsity patterns from revisions, they are kind of forced into thinking about getting them from both parents and then "doing a union". Anyway, following that logic, it'd be tempting to say that we limit the in-tree definitions to cone mode, and then if any of the definitions have conflicts then we just load stages 2 and 3 of the file and union them. But...what if stages 2 and 3 also have conflict markers in them (either because of recursive merges or the more involved rename/rename(2to1) cases)? How do we ensure a well defined "union" of values? I guess a similar question is what if users, while editing, fill the sparse definition file with syntax errors -- and maybe even commit it. Do we sparsify down to nothing? Expand out to everything? Ignore the lines that don't otherwise parse and just use the rest? Something else? The one other thing I noticed of interest from mercurial's sparsify was that it apparently suffers from the same problems we used to in git < 2.27.0: inability to update sparsity definitions when there are any dirty changes[4]. That was a huge pain point; I'm glad we're not stuck with that anymore. Anyway, the mercurial link certainly provides some ideas even if it doesn't answer all the questions. Thanks for pointing it out. Elijah [2] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_59 [3] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_123 [4] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_485 https://fossies.org/linux/mercurial/mercurial/sparse.py#l_526