Re: Static Analysis: proposed interchange format ("firehose")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2013-01-17 at 13:33 +0800, Daniel Veillard wrote:
> On Wed, Jan 16, 2013 at 03:53:56PM -0500, David Malcolm wrote:
> > This is a followup to my proposal in
> > http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html
> > 
> > I want a common output format for static analysis tools so that we can
> > easily slurp the results from different tools into a database and have a
> > common system for managing the results (marking false positives, having
> > automated de-duplication, etc).
> > 
> > (I like the name "firehose" for the overall system since it describes
> > the issue we'll have of managing the flood of data).
> > 
> > I came up with an XML format, which I've uploaded code to here:
> > https://github.com/fedora-static-analysis/firehose
> > 
> > Does this look sane?  I think that it should be possible to write
> 
>   okay, taking the question from the XML side, so analysing the
> firehose.rng schemas driving the format. Points and remarks as i go
> through it:

Thanks!

>  - the cwe attribute is a number or free form ? if a number add
>    and explicit rule to check its type.
I've constrained it to be an integer as of:
https://github.com/fedora-static-analysis/firehose/commit/43a50c6763f718b4c8163b645bf5ce7a328f6efa

(I hope I got my RELAX-NG correct)


>  - the sut content choice is a bit weird on one side you have text
>    on the other you have <rpm>, I would  still allow a free form
>    description but in an element at the same level of rpm
>    something like
>    <choice>
>      <element name="description">
>        <text/>
>      </element>
>      <element name="rpm">
>        ...
>      <element>
>    For the sake of larger usage, i would also make some room for
>    debian, and also expand that to be able to express a given file
>    to give an example allowing extra details there, and make some
>    if not all of the attributes optionals, for example to be able
>    to express independance say on the arch:
>    <sut>
>      <file>/usr/bin/xmllint</file>
>      <package type="rpm" name="libxml2" version="2.9.0" release="1.fc17">
>    </sut>
>    so optional file element, extra type attribute, use package to not
>    feel tied to rpm, but use a type attribute to distinguish :-)

Yeah, I hadn't thought out that part of the schema very well.

I've already made it optional, since I'm finding it easier to add during
post-processing.

I'm thinking that there are several cases:
* analysis done of a source rpm
  * name, version, release,  build architecture
* what would Debian want?
* analysis done of a tarball or other archive
  * name, url, sha1sum, build architecture
* analysis done of an scm checkout (e.g. from upstream git)
  * kind (git, svn, etc), url
* etc (what am I missing?)

Some possible examples of these

<sut>
   <source-rpm name="python-ethtool" version="0.7" release="4.fc19"
build-arch="x86_64"/>
</sut>

<sut>
   <tarball name="python-ethtool-0.7.tar.bz2">
       <hash alg="sha1">d8334fe3e1a9b31c8f94a4e10e516ddea617cfd2</hash>
   </tarball>
</sut>

<sut>
   <checkout scm="git"
 url="http://git.fedorahosted.org/cgit/python-ethtool.git/tag/?id=v0.7";>
   </checkout>
</sut>


>  - for notes i would separate them
>    <notes>
>      <note>...</note>
>      <note>...</note>
>    </notes>
>    since they are likely to me entered manually, and you may want to
>    track who entered them as you go.

I wasn't very clear in my posting; I'd meant these notes for extra
descriptive data emitted by the static analysis tool, with a vague idea
of a mini markup vocabulary for describing functions, variables, etc.
My cpychecker tool has knowledge about much of the CPython C API, and
knows the URLs for the API docs, so I was hoping to have some way of
providing links to those docs whenever it sees an API call within a
problematic function.


>  - I would use <where> instead of <point> myself but i understand your
>    logic too
There seem to be multiple kinds of location that checkers emit:
* file and line
* file, line and column
* file with range, expressed as a pair of the above (LLVM can emit
ranges of start line/column  end line/column)


> Long reply but overall that look mostly fine from my very narrow POV

Thanks for the review
Dave


-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux