On 7/30/2021 9:52 AM, Elijah Newren wrote: > On Thu, Jul 29, 2021 at 11:27 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: ... >> Leaving these ignored files in the sparse directories makes it >> impossible to gain performance benefits in the sparse index. When we >> track into these directories, we need to know if the files are ignored >> or not, which might depend on the _tracked_ .gitignore file(s) within >> the sparse directory. This depends on the indexed version of the file, >> so the sparse directory must be expanded. > > Is this the issue I highlighted at > https://lore.kernel.org/git/CABPp-BHsiLTtv6AuycRrQ5K6q0=ZTJe0kd7uTUL+UT=nxj66zA@xxxxxxxxxxxxxx/, > the paragraph "...I thought the point of add_patterns()..." or is > there more or other things involved here? This is exactly that point. I'm not sure why I didn't pick up on your earlier concerns as something to do _immediately_, but for some reason I thought I could delay it until later. >> If users depend on ignored files within the sparse directories, then >> they have created a bad shape in their repository. Regardless, such >> shapes would create risk that changing the behavior for all cone mode >> users might be too risky to take on at the moment. Since this data shape >> makes it impossible to get performance benefits using the sparse index, >> we limit the change to only be enabled when the sparse index is enabled. >> Users can opt out of this behavior by disabline the sparse index. > > s/disabline/disabling/, otherwise, I fully agree with this paragraph. Will fix. Thanks. >> Depending on user feedback or real-world use, we might want to consider >> expanding the behavior change to all of cone mode. Since we are >> currently restricting to the sparse index case, we can use the existence >> of sparse directory entries in the index as indicators of which >> directories should be removed. > > Let's just expand it to all of cone mode. I'm open to that. I'll leave this version open for feedback a bit longer before doing so. >> + /* >> + * NEEDSWORK: For now, only use this behavior when index.sparse >> + * is enabled. We may want this behavior enabled whenever using >> + * cone mode patterns. >> + */ >> + prepare_repo_settings(r); >> + if (!r->worktree || !r->settings.sparse_index) >> + return; > > s/may/do/ :-) Maybe! >> + >> + /* >> + * Since we now depend on the sparse index to enable this >> + * behavior, use it to our advantage. This process is more >> + * complicated without it. >> + */ > > Ah, the real reason why you limited this change to sparse-index comes out. :-) > >> + convert_to_sparse(r->index); >> + >> + strbuf_addstr(&path, r->worktree); >> + strbuf_complete(&path, '/'); >> + pathlen = path.len; >> + >> + for (i = 0; i < r->index->cache_nr; i++) { >> + struct cache_entry *ce = r->index->cache[i]; >> + >> + /* >> + * Is this a sparse directory? If so, then definitely >> + * include it. All contained content is outside of the >> + * patterns. > > "include"? I would have used the word "remove", but it gets confusing > because the question is if we want to include it in our list of things > to remove or remove it from the working directory. This comment is a bit stale, sorry. Earlier I was populating a list of the directories, but in this version I'm just deleting them immediately. >> + */ >> + if (S_ISSPARSEDIR(ce->ce_mode) && >> + repo_file_exists(r, ce->name)) { >> + strbuf_setlen(&path, pathlen); >> + strbuf_addstr(&path, ce->name); >> + >> + /* >> + * Removal is "best effort". If something blocks >> + * the deletion, then continue with a warning. >> + */ >> + if (remove_dir_recursively(&path, 0)) >> + warning(_("failed to remove directory '%s'"), path.buf); > > Um, doesn't this delete untracked files that are not ignored as well > as the ignored files? If so, was that intentional? I'm fully on > board with removing the gitignore'd files, but I'm worried removing > other untracked files is dangerous. I believe that 'git sparse-checkout (set|add|reapply)' will fail before reaching this method if there are untracked files that could potentially be removed. I will double-check to ensure this is the case. It is definitely my intention to protect any untracked, non-ignored files in these directories by failing the sparse-checkout modification. > My implementation of this concept (in an external tool) was more along > the lines of > > * Get $LIST_OF_NON_SPARSE_DIRECTORIES by walking `git ls-files -t` > output and finding common fully-sparse directories > * git clean -fX $LIST_OF_NON_SPARSE_DIRECTORIES I initially was running 'git clean -dfx -- <dir> ...' but that also requires parsing and expanding the index (or being very careful with the sparse index). > >> + } >> + } >> + >> + strbuf_release(&path); >> + >> + /* >> + * This is temporary: the sparse-checkout builtin is not >> + * integrated with the sparse-index yet, so we need to keep >> + * it full during the process. >> + */ >> + ensure_full_index(r->index); >> +} >> + >> static int update_working_directory(struct pattern_list *pl) >> { >> enum update_sparsity_result result; >> @@ -141,6 +207,8 @@ static int update_working_directory(struct pattern_list *pl) >> else >> rollback_lock_file(&lock_file); >> >> + clean_tracked_sparse_directories(r); >> + >> r->index->sparse_checkout_patterns = NULL; >> return result; >> } > > (Adding a comment here just to separate the two blocks below.) > >> @@ -540,8 +608,11 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) >> { >> int result; >> int changed_config = 0; >> + struct pattern_list *old_pl = xcalloc(1, sizeof(*old_pl)); >> struct pattern_list *pl = xcalloc(1, sizeof(*pl)); >> >> + get_sparse_checkout_patterns(old_pl); >> + >> switch (m) { >> case ADD: >> if (core_sparse_checkout_cone) >> @@ -567,7 +638,9 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) >> set_config(MODE_NO_PATTERNS); >> >> clear_pattern_list(pl); >> + clear_pattern_list(old_pl); >> free(pl); >> + free(old_pl); >> return result; >> } > > You create an old_pl, populate it with get_sparse_checkout_patterns(), > do nothing with it, then clear and free it? I'm confused by the above > two blocks. Sorry that this is also stale. I should read my own patches more carefully. I was originally going to focus only on the directories that were "leaving" the sparse-checkout definition, but that did not catch the 'reapply' case or cases where the user created the directories by other means. Will delete these bits. >> + git -C repo sparse-checkout reapply && >> + test_path_is_missing repo/folder1 && >> + test_path_is_missing repo/deep/deeper2 && >> + test_path_is_dir repo/obj && >> + test_path_is_file repo/file.o && >> + >> + git -C repo status --porcelain=v2 >out && >> + test_must_be_empty out && >> + >> + git -C repo sparse-checkout set deep/deeper2 && >> + test_path_is_missing repo/deep/deeper1 && >> + test_path_is_dir repo/deep/deeper2 && > > What's the expectation for repo/obj? I will add test_path_is_dir repo/obj because this ignored directory should not be removed. This is simulating the build artifacts that might appear in ignored directories whose parents are in the sparse-checkout definition. Thanks, -Stolee