On Fri, Feb 9, 2024 at 9:10 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Kyle Lippincott <spectral@xxxxxxxxxx> writes: > > > If I'm right that this is an issue, does this imply that we'd need to > > rename every non-static function in the git codebase that's part of a > > library to have a `git_` prefix, even if it won't be used outside of > > the git project's own .c files? Is there a solution that doesn't > > involve making it so that we have to type `git_` a billion times a day > > that's also maintainable? We could try to guess at how likely a name > > collision would be and only do this for ones where it's obviously > > going to collide, but if we get it wrong, I'm concerned that we'll run > > into subtle ODR violations that *at best* erode confidence in our > > library, and can actually cause outages, data corruption, and > > security/privacy issues. > > If you end up with a helper function name "foo" that is defined in > our X.c and used by our Y.c but is not part of the published "git > library API", we may want to rename it so that such a common name > can be used by programs that link with the "git library". We may > choose to rename it to "GitLib_foo". If it's internal, we may want to name it with a different prefix than GitLib, if we expect the exposed API of the library to have this prefix, just as a signal to readers where the internal/external boundaries are. > > Do we want to keep the source of our library code, which defines the > helper function "foo" in X.c and calls it in Y.c, intact so that the > helper is still named "foo" as far as we are concerned? Or do we > "bite the bullet" and bulk modify both the callers and the callee? > > I'd imagine that we would rather avoid such a churn at all cost [*]. > After all, "foo" is *not* supposed to be callable by any human > written code, and that is why we rename it to a name "GitLib_foo" > that is unlikely to overlap with any sane human would use. > > Side note: if a public API function that we want our library > clients to call is misnamed, we want to rename it so that we > would both internally and externally use the same public > name, I would imagine. > > The mechanics to rename should be a solved problem, I think, as we > are obviously not the first project that wants to build a library. > > If the names are all simple, we could do this in CPP, At first I thought you meant C++, and I was like "Yeah, that's a possible solution: when building a library, compile it as C++ with name mangling, except for the symbols we intend to export!" -- this was not what you meant, though. Kind of amusingly, that idea might work, and might even be maintainable once we got to that state, but getting to that state would be a lot of cleanup because of C++'s stricter type system (`char *p = ptr;`, where `ptr` is a `void*` for example; maybe this is a call to malloc or similar). Since the git libraries don't exist yet, there's technically no worries about backwards compatibility with requiring a C++ compiler. > i.e. invent a > header file that has bunch of such renames like > > #define foo GitLib_foo > > and include it in both X.c and Y.c. But "foo" may also appear as > the name of a type, a member in a structure, etc., where we may not > want to touch, so in a project larger than toy scale, this approach > may not work well. Glancing at the tags file, it looks like there's a small number of cases where this would be problematic, and they're mostly things where there's a function named the same thing as either a struct variable storing the result of the function. So it could work, but there's over 3,500 symbols (if I did my filtering of the tags file correctly) that are not scoped to a specific file (i.e. static), or struct/enum/typedef/union names. That's going to be quite annoying to maintain; even if we don't end up having to do all 3,500 symbols, for the files that are part of some public library, we'd add maintenance burden because we'd need to remember to either make every new function be static, or add it to this list. I assume we could create a test that would enforce this ("static, named with <prefix>, or added to <list>"), so the issue is catchable, but it will be exceedingly annoying every time one encounters this. > > "objcopy --redefine-sym" would probably be a better way. I haven't > written a linker script, but I heard rumors that there is RENAME() > to rename symbols, and using that might be another avenue. > > I'd thought of linker scripts, but rejected the idea due to assumptions I made about their portability - this could be mitigated by having a linker-script-generator step in the build process, but this seems difficult to maintain. It also implies the same maintenance burden as the #defines, where when introducing a new function to X.c that is called from Y.c we'd have to edit the list of "symbols to rename".