On Tue, Aug 24, 2021 at 11:30 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 8/20/2021 5:32 PM, Elijah Newren wrote: > > On Tue, Aug 17, 2021 at 10:08 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@xxxxxxxxx> wrote: > >> > >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > >> > >> The diff_populate_filespec() method is used to describe the diff after a > >> merge operation is complete, especially when a conflict appears. In > >> order to avoid expanding a sparse index, the reuse_worktree_file() needs > >> to be adapted to ignore files that are outside of the sparse-checkout > >> cone. The file names and OIDs used for this check come from the merged > >> tree in the case of the ORT strategy, not the index, hence the ability > >> to look into these paths without having already expanded the index. > > > > I'm confused; I thought the diffstat was only shown if the merge was > > successful, in which case there would be no conflicts appearing. > > That's my mistake. I'll edit the message accordingly. > > > Also, I'm not that familiar with the general diff machinery (just the > > rename detection parts), but...if the diffstat only shows when the > > merge is successful, wouldn't we be comparing two REVS (ORIG_HEAD to > > HEAD)? Why would we make use of the working tree at all in such a > > case? And, wouldn't using the working tree be dangerous...what if > > there was a merge performed with a dirty working tree? > > > > On a bit of a tangent, I know diffcore-rename.c calls into > > diff_populate_filespec() as well, and I have some code doing merges in > > a bare repository (where there obviously is no index). It seemed to > > be working, but given this commit message, now I'm wondering if I've > > missed something fundamental either in that implementation or there's > > something amiss in this patch. Or both. Maybe I need to dig into > > diff_populate_filespec() more, but it seems you already have. Any > > pointers to orient me on why your changes are right here (and, if you > > know, why diffcore-rename.c should or shouldn't be using > > diff_populate_filespec() in a bare repo)? > > I think the cases you are thinking about are covered by this > condition before the one I'm inserting: > > /* We want to avoid the working directory if our caller > * doesn't need the data in a normal file, this system > * is rather slow with its stat/open/mmap/close syscalls, > * and the object is contained in a pack file. The pack > * is probably already open and will be faster to obtain > * the data through than the working directory. Loose > * objects however would tend to be slower as they need > * to be individually opened and inflated. > */ > if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) > return 0; > > or after: > > /* > * Similarly, if we'd have to convert the file contents anyway, that > * makes the optimization not worthwhile. > */ > if (!want_file && would_convert_to_git(istate, name)) > return 0; > > (This makes me think that I should move my new condition further down > so these two can be linked by context.) > > Sounds like this is just an optimization, so it is fine to opt out of it > if we think the optimization isn't necessary. Outside of the sparse-checkout > cone qualifies. Ah, this is very helpful. Thanks for digging up these details.