On Thu, Nov 12, 2020 at 2:30 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Thu, Nov 12, 2020 at 12:15 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > > > > + /* > > > + * Now we've got an OID for last_directory in dir_info. We need to > > > + * add it to info->versions for it to be part of the computation of > > > + * its parent directories' OID. But first, we have to find out what > > > + * its' parent name was and whether that matches the previous > > > + * info->offsets or we need to set up a new one. > > > + */ > > > + prev_dir = info->offsets.nr == 0 ? NULL : > > > + info->offsets.items[info->offsets.nr-1].string; > > > + if (new_directory_name != prev_dir) { > > > + uintptr_t c = info->versions.nr; > > > + string_list_append(&info->offsets, > > > + new_directory_name)->util = (void*)c; > > > + } > > > > Because of the possible jump of 2 or more directories that I mentioned > > earlier, there may be gaps in the offsets. So it makes sense that we > > sometimes need to insert an intermediate one. > > > > I wonder if the code would be clearer if we had explicit "begin tree" > > and "end tree" steps just like in list-objects-filter.c (LOFS_BEGIN_TREE > > and LOFS_END_TREE). Here we have "end tree" (because of the way the > > entries were sorted) but not "begin tree". If we had "begin tree", we > > probably would be able to create the necessary offsets in a loop at that > > stage, and the reasoning about the contents of the offsets would not be > > so complicated. > > > > If we really only want one side (i.e. you don't want to introduce a > > synthetic entry just to mark the end or the beginning), then my personal > > experience is that having the "begin" side is easier to understand, as > > the state is more natural and easier to reason about. (Unlike here, > > where there could be gaps in the offsets and the reader has to > > understand that the gaps will be filled just in time.) But that may just > > be my own experience. > > Interesting, I'll take a look into it. > So, I've been going through making all the changes you and Derrick suggested or highlighted...but I don't see how to tackle this one. Perhaps I'm missing something. Using your example of LOFS_BEGIN_TREE and LOFS_END_TREE from list-objects-filter.c, I note that you handle it as part of traverse_trees(), and thus you have a very natural "I'm going to process this tree" point and "I'm done processing this tree" point. There is no equivalent mapping to merge-ort that I can figure out. merge-ort does use traverse_trees() in collect_merge_info(), and fills opt->priv->paths with all full pathnames (both files and directories) found in any of the three trees. But I cannot process files/directories at that time; rename detection needs traverse_trees() to be finished to have all paths so far. Further, the list of pathnames from traverse_trees is not necessarily complete; additional paths could be added by any of * Directory/file conflicts (need to move the file to a different location) * Directory/submodule conflicts (need to move something to a different location) * Add/add conflicts of files of different types (e.g. symlink/regular file; can't just content merge them with conflict markers) * Directory rename detection (can move new files or even directories on one side of history into a new directory on other side) Thus, after traverse_trees() ends, my rename detection stuff can add paths (including new directories), then process_entries() can add paths -- and remove some when the resolution is to delete. And the code here in question runs as part of the process_entries() loop. Now, we'd still be able to create synthetic BEGIN_TREE markers if we operated in lexicographic ordering, but process_entries() *must* operate in _reverse_ lexicographic ordering because: 1) subtrees need to be written out before trees are; hashes of those subtrees are used in the parent tree 2) it's the only sane way to handle directory/file conflicts; I need to know if all entries under the directory resolve to nothing; if not, the directory is still in the way when it comes time to process the file. Granted, I could do some tricky calculations based on the reverse lexicographic ordering of fullpaths (and their containing directories) to figure out where trees begin and end -- but that takes us to exactly what I *did* do. It was precisely this step that you thought should be made simpler, but I'm not seeing how to avoid it. For now, I'll keep the code as-is, but add more comments to both the data structure and the code. If I've missed something about how I could make use of your BEGIN_TREE idea, let me know and I'll look at it again.