On Thu, Feb 23, 2023 at 7:29 AM Derrick Stolee <derrickstolee@xxxxxxxxxx> wrote: > > On 2/23/2023 4:14 AM, Elijah Newren via GitGitGadget wrote: > > This patch is primarily about moving internal-only fields within these two > > structs into an embedded internal struct. Patch breakdown: > > > > * Patches 1-3: Restructuring dir_struct > > * Patch 1: Splitting off internal-use-only fields > > * Patch 2: Add important usage note to avoid accidentally using > > deprecated API > > * Patch 3: Mark output-only fields as such > > * Patches 4-11: Restructuring unpack_trees_options > > * Patches 4-6: Preparatory cleanup > > * Patches 7-10: Splitting off internal-use-only fields > > * Patch 11: Mark output-only field as such > > > And after the changes: > > > > struct dir_struct { > > enum [...] flags; > > int nr; /* output only */ > > int ignored_nr; /* output only */ > > struct dir_entry **entries; /* output only */ > > struct dir_entry **ignored; /* output only */ > > struct untracked_cache *untracked; > > const char *exclude_per_dir; /* deprecated */ > > struct dir_struct_internal { > > int alloc; > > int ignored_alloc; > > #define EXC_CMDL 0 > > #define EXC_DIRS 1 > > #define EXC_FILE 2 > > struct exclude_list_group exclude_list_group[3]; > > struct exclude_stack *exclude_stack; > > struct path_pattern *pattern; > > struct strbuf basebuf; > > struct oid_stat ss_info_exclude; > > struct oid_stat ss_excludes_file; > > unsigned unmanaged_exclude_files; > > unsigned visited_paths; > > unsigned visited_directories; > > } internal; > > }; > > This does present a very clear structure to avoid callers being > confused when writing these changes. It doesn't, however, present > any way to guarantee that callers can't mutate this state. > > ...here I go on a side track thinking of an alternative... > > One way to track this would be to anonymously declare 'struct > dir_struct_internal' in the header file and let 'struct dir_struct' > contain a _pointer_ to the internal struct. The dir_struct_internal > can then be defined inside the .c file, limiting its scope. (It > must be a pointer in dir_struct or else callers would not be able > to create a dir_struct without using a pointer and an initializer > method. > > The major downside to this pointer approach is that the internal > struct needs to be initialized within API calls and somehow cleared > by all callers. The internal data could be initialized by the common > initializers read_directory() or fill_directory(). There is a > dir_clear() that _should_ be called by all callers (but I notice we > are leaking the struct in at least one place in add-interactive.c, > and likely others). > > This alternative adds some complexity to the structure, but > provides compiler-level guarantees that these internals are not used > outside of dir.c. I thought it worth exploring, even if we decide > that the complexity is not worth those guarantees. > Another approach, if you don't mind structure pointer math is to create two structures: a) the external public one in the public header file struct dir_entry { <public stuff> }; b) a private structure in the source file: struct dir_entry_private { struct dir_entry entry; <private stuff> }; In the source file you also define a macro/function that can take a pointer to dir_entry and get a pointer to dir_entry_private: struct dir_entry_private *dir_entry_to_private(struct dir_entry *entry) { return (struct dir_entry_private *)(<calculate the offset that entry inside dir_entry_private is, then subtract that from entry pointer>)) } In Linux kernel this is container_of, not sure if git has this already defined and its a common pattern. Then you can set the code up such that the only way to allocate a dir_entry is to call some function in the dir code. If a new entry needs to be allocated, implement an alloc function that has the full private structure definition. This way you don't need an extra field in the dir_entry struct at all, but at the cost of requiring special allocations. It works great for code where the only way to get a dir_entry is already some other functions that would ensure the private version is setup correctly. > The best news is that your existing series makes it easier to flip > to the internal pointer method in the future, since we can shift > the 'd->internal.member" uses into "d->internal->member" in a > mechanical way. Thus, the change you are proposing does not lock us > into this approach if we change our minds later. > I think either approach is good. I like container_of because I'm quite used to it in low level kernel code and its a good way to provide private/public split abstractions there. The private pointer variation is also a common approach to this problem and I think it sounds like we already use it in a few places. Its perhaps better for those reasons. > Thanks, > -Stolee