On Monday 04 February 2013 22:37:45 David Malcolm wrote: > Content-addressed storage: they're named by SHA-1 sum of their contents, > similar to how git does it, so if the bulk of the files don't change, > they have the same SHA-1 sum and are only stored once. See e.g.: > http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-30/python-ethtool > -0.7-4.fc19.src.rpm/static-analysis/sources/ I probably should gzip them as > well. This can indeed save some space. I really like the idea. Maybe using a true git store would give you additional reduction of space requirements thanks to using the delta compression. > Currently it's capturing all C files that have GCC invoked on them, or > are mentioned in a warning (e.g. a .h file with an inline function with > a bug). I could tweak things so it only captures files that are > mentioned in a warning. But then you would be no longer able to provide the context. If the error trace goes through a function foo() defined in another module of the same package, the user needs to look at its definition to confirm/waive the defect. > I guess the issue is: where do you store the knowledge about good vs bad > warnings? My plan was to store it server-side. But we could generate > summaries and have them available client-side. For example, if, say > cppcheck's "useClosedFile" test has generated 100 issues of which 5 have > received human attention: 1 has been marked as a true positive, and 4 > has been marked as false positives. We could then say ("cppcheck", > "useClosedFile") has a signal:noise ratio of 1:4. We could then > generate a summary of these (tool, testID) ratios for use by clients, > which could then a user-configurable signal:noise threshold, so you can > say: "only show me results from tests that achieve 1:2 or better". I did not realize you mean auto-filtering based on statistics form user's input. Then maintaining the statistics at the server sounds as a good idea. Being able to export a text file with scores per checker should be just fine for the command-line tools. We will see if the statistics from user's input could be used as a reliable criterion. The problem is that some defects tend to be classified incorrectly without a deeper analysis of the report (and code). > > The limitation of javascript-based UIs is that they are read-only. Some > > developers prefer to go through the defects using their own environment > > (eclipse, vim, emacs, ...) rather than a web browser so that they can fix > > them immediately. We should support both approaches I guess. > > Both approaches. What we could do is provide a tool ("fedpkg > get-errors" ?) that captures the errors in the same output format as > gcc. That way if you run it from say gcc, the *compilation* buffer has > everything in the right format, and emacs' goto-next-error stuff works. 'fedpkg foo' is probably overkill at this point. My concern was rather that we should not so much rely on the web server/browser approach in the first place. I would like to have most of the equipment working just from terminal without any server or browser. Any server solution can be then easily built on top of it. > Currently it's matching on 4 things: > * by name of test tool (e.g. "clang-analyzer") + the class of defect? e.g. "useClosedFile" in your example above... > * by filename of C file within the tarball (so e.g. > '/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c' > becomes 'python-ethtool/etherinfo.c', allowing different versions to be > compared) With some part of the path or just base name? > * function name (or None) You want to work with full signatures if you are going to support overloaded functions/methods in C++. > * text of message The messages cannot be checked for exact match in certain cases. Have a look at the rules we use in csdiff for the text messages: http://git.fedorahosted.org/cgit/codescan-diff.git/plain/csfilter.cc > See "make-comparative-report.py:ComparativeIssues" in > https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/re > ports/make-comparative-report.py Actually my comment was not about the matching algorithm, but about the way you present the comparative results. The UI is based on comparing a pair of source files. In many cases you will fail to find a proper pairing of source files between two versions of a package. Kamil -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel