On Fri, Apr 15 2022, Phillip Wood wrote: > Hi Ævar and René > > On 15/04/2022 14:53, Ævar Arnfjörð Bjarmason wrote: >> On Fri, Apr 15 2022, René Scharfe wrote: >> >>> Am 15.04.22 um 12:21 schrieb Ævar Arnfjörð Bjarmason: >>>> Return NULL instead of possibly feeding a NULL to memset() in >>>> reftable_calloc(). This issue was noted by GCC 12's -fanalyzer: >>>> >>>> reftable/publicbasics.c: In function ‘reftable_calloc’: >>>> reftable/publicbasics.c:43:9: error: use of possibly-NULL ‘p’ where non-null expected [CWE-690] [-Werror=analyzer-possible-null-argument] >>>> 43 | memset(p, 0, sz); >>>> | ^~~~~~~~~~~~~~~~ >>>> [...] >>>> >>>> This bug has been with us ever since this code was added in >>>> ef8a6c62687 (reftable: utility functions, 2021-10-07). >>> >>> Not sure it's a bug, or limited to this specific location. AFAICS it's >>> more a general lack of handling of allocation failures in the reftable >>> code. Perhaps it was a conscious decision to let the system deal with >>> them via segfaults? >> I think it's just buggy, it tries to deal with malloc returning NULL >> in >> other contexts. > > Does it? I just quickly scanned the output of > > git grep -e 'reftable_[mc]alloc' -e 'reftable_realloc' -A2 origin/master > > and I didn't see a single NULL check I think you're right, I retrieved that information from wetware. Looking again I think I was confusing it with the if (x) free(x) fixes in 34230514b83 (Merge branch 'hn/reftable-coverity-fixes', 2022-02-16). >>> When the code is used by Git eventually it should use xmalloc and >>> xrealloc instead of calling malloc(3) and realloc(3) directly, to >>> handle allocation failures explicitly, and to support GIT_ALLOC_LIMIT. >>> This will calm down the analyzer and extend the safety to the rest of >>> the reftable code, not just this memset call. >> Yeah, its whole custom malloc handling should go away. > > Isn't it there so the same code can be used by libgit2 and other > things that let the caller specify a custom allocator? I don't think so, but I may be wrong. I think it's a case of over-generalization where we'd be better off with something simpler, i.e. just using our normal allocation functions. That still allows for taking this code and plugging it into any custom allocator. If you simply want to compile some code such as this to use another allocator you can easily do that via the linker, as e.g. git.git's build process allows you to do with nedmalloc. So presumably if libgit2 would want to use this they'd just do that via their build process. I.e. they'd have "malloc" point to a symbol that would resolve to a shimmy layer that would dispatch to functions using this: https://github.com/libgit2/libgit2/blob/main/src/util/alloc.c The reason you'd use these sorts of pluggable malloc functions is, I think, a few: 1. You are a library like libgit2, as opposed to libreftable itself, so you want to provide this sort of "set the alloc pointers to this" now. But in this case we either always want malloc/free (git.git) or the "parent" can provide these wrappers (libgit2). 2. You want to support switching dynamically (or not-globally), or have N instances with different malloc, as e.g. the PCREv2 API does: cbe81e653fa (grep/pcre2: move back to thread-only PCREv2 structures, 2021-02-18). For this code I think that's thoroughly in YAGNI territory. 3. Your distribution model can't assume the user can adjust this via linking, e.g. C source that's intended to be #included as-is (although there you could use defines...) etc. But maybe I'm missing something. But a really good reason not to do this and just rely on the link-time overriding is: A. Things look the same, and benefit from more general fixes (although to be fair the x*alloc() wrappers would work either way..) B. You don't get subtle bugs where you forget some_custom_malloc() and then use a generic free(). Having looked just now reftable/ has at least one such submarine-segfault waiting to happen. C. If you don't actually need the hyper-customize pluggable model of say PCREv2 where the library is trying to support having two patterns in memory managed by different allocators, or other such advanced use-cases it's just extra complexity. Some of this was discussed for xdiff/* in this thread: https://lore.kernel.org/git/220216.86leybszht.gmgdl@xxxxxxxxxxxxxxxxxxx/