On 11/2/2020 3:43 PM, Elijah Newren wrote: > +static void setup_path_info(struct merge_options *opt, > + struct string_list_item *result, > + const char *current_dir_name, > + int current_dir_name_len, > + char *fullpath, /* we'll take over ownership */ > + struct name_entry *names, > + struct name_entry *merged_version, > + unsigned is_null, /* boolean */ > + unsigned df_conflict, /* boolean */ > + unsigned filemask, > + unsigned dirmask, > + int resolved /* boolean */) > +{ > + struct conflict_info *path_info; In addition to my concerns below about 'conflict_info' versus 'merged_info', I was doubly confused that 'result' in the parameter list is given a variable named 'pi' for "path info" and result->util eventually is equal to this path_info. What if we renamed 'result' to 'pi' for "path info" here, then operated on 'pi->util' in this method? > + path_info = xcalloc(1, resolved ? sizeof(struct merged_info) : > + sizeof(struct conflict_info)); Hm. I'm happy to have a `struct merged_info *` pointing to a `struct conflict_info`, but the opposite seems very dangerous. Perhaps we should always use sizeof(struct conflict_info)? We can use path_info->merged.clean to detect whether the rest of the data is worth looking at. (Or, in your case, whether or not it is allocated.) I imagine that in a large repo we will need many of these structs, but very few of them will actually need to be conflicts, so using 'struct conflict_info' always will lead to memory bloat. But in that case, would we not be better off with an array instead of a scattering of data across the heap? Perhaps 'struct conflict_info' shouldn't contain a 'struct merged_info' and instead be just the "extra" data. Then we could have a contiguous array of 'struct merged_info' values for most of the paths, but heap pointers for 'struct conflict_info' as necessary. It's also true that I haven't fully formed a mental model for how these are used in your algorithm, so I'll keep reading. > + path_info->merged.directory_name = current_dir_name; > + path_info->merged.basename_offset = current_dir_name_len; > + path_info->merged.clean = !!resolved; > + if (resolved) { > + path_info->merged.result.mode = merged_version->mode; > + oidcpy(&path_info->merged.result.oid, &merged_version->oid); > + path_info->merged.is_null = !!is_null; > + } else { > + int i; > + > + for (i = 0; i < 3; i++) { > + path_info->pathnames[i] = fullpath; > + path_info->stages[i].mode = names[i].mode; > + oidcpy(&path_info->stages[i].oid, &names[i].oid); > + } > + path_info->filemask = filemask; > + path_info->dirmask = dirmask; > + path_info->df_conflict = !!df_conflict; > + } > + strmap_put(&opt->priv->paths, fullpath, path_info); > + result->string = fullpath; > + result->util = path_info; This is set in all cases, so should we use it everywhere? Naturally, there might be a cost to the extra pointer indirection, so maybe we create a 'struct conflict_info *util' to operate on during this method, but set 'result->util = util' right after allocating so we know how it should behave? > @@ -91,10 +136,12 @@ static int collect_merge_info_callback(int n, > */ > struct merge_options *opt = info->data; > struct merge_options_internal *opti = opt->priv; > - struct conflict_info *ci; > + struct string_list_item pi; /* Path Info */ > + struct conflict_info *ci; /* pi.util when there's a conflict */ ... > + setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, > + names, NULL, 0, df_conflict, filemask, dirmask, 0); > + ci = pi.util; Here is the use of 'pi' that I was talking about earlier. Thanks, -Stolee