Re: [RFC v2 00/10] Introduce an extensible static analyzer

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 14 Oct 2022 00:00:58 +0200

On Fri, Jul 29, 2022 at 3:01 PM Alberto Faria <afaria@xxxxxxxxxx> wrote:
> Performance isn't great, but with some more optimization, the analyzer
> should be fast enough to be used iteratively during development, given
> that it avoids reanalyzing unmodified translation units, and that users
> can restrict the set of translation units under consideration. It should
> also be fast enough to run in CI (?).

I took a look again today, and the results are indeed very nice (I
sent a patch series with the code changes from this one).

The performance is not great as you point out. :/  I made a couple
attempts at optimizing it, for example the "actual_visitor" can be
written in a more efficient way like this, to avoid the stack:

    @CFUNCTYPE(c_int, Cursor, Cursor, py_object)
    def actual_visitor(node: Cursor, parent: Cursor, client_data:
Cursor) -> int:

        try:
            node.parent = client_data

            # several clang.cindex methods need Cursor._tu to be set
            node._tu = client_data._tu
            r = visitor(node)
            if r is VisitorResult.RECURSE:
                return 0 \
                    if conf.lib.clang_visitChildren(node,
actual_visitor, node) != 0 \
                    else 1
            else:
                return r.value

        except BaseException as e:
            # Exceptions can't cross into C. Stash it, abort the visitation, and
            # reraise it.
            if exception is None:
                exception = e

            return VisitorResult.BREAK.value

    root.parent = None
    result = conf.lib.clang_visitChildren(root, actual_visitor, root)

    if exception is not None:
        raise exception

    return result == 0

However, it seems like a lost battle. :( Some of the optimizations are
stuff that you should just not have to do, for example only invoking
"x.kind" once (because it's a property not a field). Another issue is
that the bindings are incomplete, for example if you have a ForStmt
you just get a Cursor and you are not able to access individual
expressions. As a result, this for example is wrong in the
return-value-never-used test:

                static int f(void) { return 42; }
                static void g(void) {
                    for (f(); ; ) { } /* should warn, it doesn't */
                }

and I couldn't fix it without breaking "for (; f(); )" because AFAICT
the two are indistinguishable.

On top of this, using libclang directly should make it possible to use
the Matcher API (the same one used by clang-match), instead of writing
everything by hand. It may not be that useful though in practice, but
it's a possibility.

Paolo