Re: libification: how to avoid symbol collisions?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 9, 2024 at 9:10 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Kyle Lippincott <spectral@xxxxxxxxxx> writes:
>
> > If I'm right that this is an issue, does this imply that we'd need to
> > rename every non-static function in the git codebase that's part of a
> > library to have a `git_` prefix, even if it won't be used outside of
> > the git project's own .c files? Is there a solution that doesn't
> > involve making it so that we have to type `git_` a billion times a day
> > that's also maintainable? We could try to guess at how likely a name
> > collision would be and only do this for ones where it's obviously
> > going to collide, but if we get it wrong, I'm concerned that we'll run
> > into subtle ODR violations that *at best* erode confidence in our
> > library, and can actually cause outages, data corruption, and
> > security/privacy issues.
>
> If you end up with a helper function name "foo" that is defined in
> our X.c and used by our Y.c but is not part of the published "git
> library API", we may want to rename it so that such a common name
> can be used by programs that link with the "git library".  We may
> choose to rename it to "GitLib_foo".

If it's internal, we may want to name it with a different prefix than
GitLib, if we expect the exposed API of the library to have this
prefix, just as a signal to readers where the internal/external
boundaries are.

>
> Do we want to keep the source of our library code, which defines the
> helper function "foo" in X.c and calls it in Y.c, intact so that the
> helper is still named "foo" as far as we are concerned?  Or do we
> "bite the bullet" and bulk modify both the callers and the callee?
>
> I'd imagine that we would rather avoid such a churn at all cost [*].
> After all, "foo" is *not* supposed to be callable by any human
> written code, and that is why we rename it to a name "GitLib_foo"
> that is unlikely to overlap with any sane human would use.
>
>         Side note: if a public API function that we want our library
>         clients to call is misnamed, we want to rename it so that we
>         would both internally and externally use the same public
>         name, I would imagine.
>
> The mechanics to rename should be a solved problem, I think, as we
> are obviously not the first project that wants to build a library.
>
> If the names are all simple, we could do this in CPP,

At first I thought you meant C++, and I was like "Yeah, that's a
possible solution: when building a library, compile it as C++ with
name mangling, except for the symbols we intend to export!" -- this
was not what you meant, though. Kind of amusingly, that idea might
work, and might even be maintainable once we got to that state, but
getting to that state would be a lot of cleanup because of C++'s
stricter type system (`char *p = ptr;`, where `ptr` is a `void*` for
example; maybe this is a call to malloc or similar). Since the git
libraries don't exist yet, there's technically no worries about
backwards compatibility with requiring a C++ compiler.

> i.e. invent a
> header file that has bunch of such renames like
>
>     #define foo GitLib_foo
>
> and include it in both X.c and Y.c.  But "foo" may also appear as
> the name of a type, a member in a structure, etc., where we may not
> want to touch, so in a project larger than toy scale, this approach
> may not work well.

Glancing at the tags file, it looks like there's a small number of
cases where this would be problematic, and they're mostly things where
there's a function named the same thing as either a struct variable
storing the result of the function. So it could work, but there's over
3,500 symbols (if I did my filtering of the tags file correctly) that
are not scoped to a specific file (i.e. static), or
struct/enum/typedef/union names. That's going to be quite annoying to
maintain; even if we don't end up having to do all 3,500 symbols, for
the files that are part of some public library, we'd add maintenance
burden because we'd need to remember to either make every new function
be static, or add it to this list. I assume we could create a test
that would enforce this ("static, named with <prefix>, or added to
<list>"), so the issue is catchable, but it will be exceedingly
annoying every time one encounters this.

>
> "objcopy --redefine-sym" would probably be a better way.  I haven't
> written a linker script, but I heard rumors that there is RENAME()
> to rename symbols, and using that might be another avenue.
>
>

I'd thought of linker scripts, but rejected the idea due to
assumptions I made about their portability - this could be mitigated
by having a linker-script-generator step in the build process, but
this seems difficult to maintain. It also implies the same maintenance
burden as the #defines, where when introducing a new function to X.c
that is called from Y.c we'd have to edit the list of "symbols to
rename".





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux