2012/12/11 David Malcolm <dmalcolm@xxxxxxxxxx>: > A while back I ran my static checker on all of the Python extension > modules in Fedora 17: > http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts > > I wrote various scripts to build the packages in a mock environment that > injects my checker into gcc, then wrote various scripts to triage the > results. I then filed bugs by hand for the most important results, > writing some more scripts along the way to make the process easier. > > This led to some valuable bug fixes, but the mechanism for running the > analysis was very ad hoc and doesn't scale. I think it could be useful at least as a generic tool where one would just do something like: make CC=gcc-with-python-plugin like some time ago one could run make CC=cgcc to see what sparse would tell. Or maybe think of it as a tool like rpmlint. > In particular, we don't yet have an automated way of rerunning the > tests, whilst using the old results as a baseline. For example it would > be most useful if only new problems could be reported, and if the system > (whatever it is) remembered when a report has been marked as a true bug > or as a false positive. Similarly, there's no automated way of saying > "this particular test is bogus; ignore it for now". Something like valgrind's .supp files? > I'm wondering if there's a Free Software system for doing this kind of > thing, and if not, I'm thinking of building it. > > What I have in mind is a web app backed by a database (perhaps > "checker.fedoraproject.org" ?) Remembers me of http://upstream-tracker.org/ > We'd be able to run all of the code in Fedora through static analysis > tools, and slurp the results into the database: primarily my > "cpychecker" work, but we could also run the clang analyzer etc. I've > also been working on another as-yet-unreleased static analysis tool for > which I'd want a db for the results. What I have working is a way to > inject an analysis payload into gcc within a mock build, which dumps > JSON report files into the chroot without disturbing the "real" build. > The idea is then to gather up the JSON files and insert the report data > into the db, tagging it with version information. > > There are two dimensions to the version information: > (A) the version of the software under analysis > (name-version-release.arch) > (B) the version of the tool doing the analysis > > We could use (B) within the system to handle the release cycle of a > static analysis tool. Initially, any such analysis tools would be > regarded as "experimental", and package maintainers could happily ignore > the results of such a tool. The maintainer of an analysis tool could > work on bug fixes and heuristics to get the signal:noise ratio of the > tool up to an acceptable level, and then the status of the analysis tool > could be upgraded to an "alpha" level or beyond. > > Functional Requirements: > * a collection of "reports" (not bugs): > * interprocedural control flow, potentially across multiple source > files (potentially with annotations, such as value of variables, > call stack?) > * syntax highlighting > * capturing of all relevant source (potentially with headers as > well?) > * visualization of control flow so that you can see the path > through the code that leads to the error > * support for my cpychecker analysis > * support for an as-yet-unreleased interprocedural static analysis > tool I've been working on > * support for reports from the clang static analyzer > * ability to mark a report as: > * a true bug (and a way to act on it, e.g. escalate to bugzilla or > to the relevant upstream tracker) > * a false positive (and a way for the analysis maintainer to act > on it) > * other bug associations with a report? (e.g. if the wording from > the tool's message could be improved) > * ability to have a "conversation" about a report within the UI as > a series of comments (similar to bugzilla). > * automated report matching between successive runs, so that the > markings can be inherited > * scriptable triage, so that we can write scripts that mark all > reports matching a certain pattern e.g. as being bogus, as being > security sensitive, etc > * potentially: debug data (from the analysis tool) associated with a > report, so that the maintainers of the tool can analyze a false > positive > * ability to store crash results where some code broke a static > analysis tool, so that the tool can be fixed > * association between reports and builds > * association between builds and source packages > * association between packages and people, so that you can see what > reports are associated with you (perhaps via the pkgdb?) > * prioritization of reports to be generated by the tool > * association between reports and tools (and tool versions) > * "quality marking" of tool versions, so that we can ignore "alpha" > versions of tools and handle phasing in of a new static analysis > tool without spamming everyone > * ability to view the signal:noise ratio of a version of a tool > > Nonfunctional requirements: > * Free Software > * sanely deployable within Fedora infrastructure > * sane code, since we're likely to want to extend it (fwiw I'd be most > comfortable with a Python implementation). > * able to scale to running all of Fedora through multiple tools > repeatedly > * many simultaneous users > * will want an authentication system so that we can associate comments > with users. Eventually we may want a way of embargoing > security-sensitive bugs found by the tool so that they're only > visible by a trusted cabal. > * authentication system to support FAS, but not require it, in case > other people want to deploy such a tool. Maybe OpenID? > > Implementation ideas: > * as well as a relational database for the usual things, perhaps a > lookaside of source files stored gzipped, with content-addressed storage > e.g. "0fcb0d45a6353e150e26f1fa54d11d7be86726b6" stored gzipped as: > objects/0f/cb0d45a6353e150e26f1fa54d11d7be86726b6 > (yes, this looks a lot like git) > > Thoughts? Does such a thing already exist? I am sure anything that can help in detecting runtime failures is welcome. > It might be fun to hack on this at the next FUDcon. For anybody interest, the most relevant results after searching a bit :-) http://samate.nist.gov/index.php/Source_Code_Security_Analyzers.html http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis http://developers.slashdot.org/story/08/05/19/1510245/do-static-source-code-analysis-tools-really-work > Dave Paulo -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel