Re: Smatch mailing list archives

Dan Carpenter <dan.carpenter@xxxxxxxxxx> · Mon, 10 May 2021 10:20:22 +0300

On Mon, May 10, 2021 at 06:17:27AM +0000, Reshetova, Elena wrote:
> > On Fri, May 07, 2021 at 02:22:37PM +0000, Reshetova, Elena wrote:
> > > Hi,
> > >
> > > I have been working for a while now on a new smatch pattern, but
> > > would really appreciate additional information points such as past
> > > email discussions, etc.
> > >
> > > So I am wondering if there is a way to browse through
> > > the archives of this mailing list in order to try to find the
> > > information I need?
> > 
> > Sorry, I don't think it's archived anywhere.  There isn't a lot of
> > traffic on the list.  About three times a year someone reports that
> > Smatch is crashing for them.
> > 
> > I'm always happy to answer questions if there is any way I can help?
> 
> Thank you Dan! I am pretty new with smatch so that's why I was
> hoping to browse through the existing mails to see if my simple questions
> are already answered, but here is my current issue. 
> 
> What is the best way to create identifiers for the findings that certain smatch
> pattern finds in the kernel? Let's say I have a new pattern that is able to find
> different problematic places and report them in usual smatch way: errors and
> warnings with file name, line number, function name, etc. 
> Now for our pattern in order to be sure that the reported issue exists/does not
> exists, somebody needs to go and look at the code manually and make a call. 
> After this, it would be nice to mark this place as safe/concern in the report and be
> able to transfer these results for kernel versions bumps (5.11->5.12, etc.) as soon as
> the code in this function where finding was reported has not changed (and there
> might be multiple findings per function).
> 
> What is the best way of doing it? 
> I was first thinking of using some simple hash for the reported line (lines around, relative
> position within the reported function),
> but now I think I need also to hash the whole function in addition to the finding itself. 
> 
> Then the logic of transferring the result would be:
> 
> For each finding calculate: 
>  1. finding_line_hash: the hash of the line that resulted in finding (becomes a unique id
>      within the function).
> 2. finding_function_hash: the hash of the function that produced the finding (becomes a
>    unique global id within the kernel) and helps to determine if the function has not been
>    changed between the kernel versions. 
> 
> Logic for the result transfer: 
> 
> If both finding_line_hash and finding_function_hash match between the two smatch reports
> for two different versions, then it is relatively safe to transfer this concrete smatch finding
> and its manual audit result automatically.
> 
> Does it make sense overall? If yes, what is the easiest way in smatch to get hash data for 
> 1 and 2? I.e. get full reported line as a string and full function content as a string? 

I use the a script smatch_scripts/new_bugs.pl  It strips out the
variables names from the single quotes and any numbers and the
parentheses so it looks like this:

Original warning:

    fs/fuse/virtio_fs.c:1468 virtio_fs_get_tree() error: double free of 'fm'

Stripped:

    fs.fuse.virtio_fs.c.virtio_fs_get_tree_error:_double_free_of_''

You could hash the stripped string.  Looking at it now, the variable
name is actually useful and shouldn't be stripped out.  Doh...

I don't know what the zero day bot does for this to mark warnings as
dealt with or not.  There is also the Aiaiai project
(https://www.openhub.net/p/aiaiai) which probably has a feature for
marking warnings as reviewed.

regards,
dan carpenter