Powered by Linux
Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations — Semantic Matching Tool

Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 07, 2019 at 04:55:44PM +0100, Andrew Murray wrote:
> On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote:
> > The kernel provides a number of annotations that can be used by smatch
> > for static analysis, for example __user and __kernel - the kernel relies
> > on the GCC attribute 'address_space(x)' to implement these annotations.
> > By adding a new annotation named __untagged with address space 5 we can
> > use this annotation to indicate function parameters which should not
> > be treated as tagged addresses.
> > 
> > Let's give Smatch an understanding of this annotation, when used by
> > function parameters, by adding a '[u]' tag to the end of their user value
> > ranges. We ensure that upon merges of range lists states, we only preserve
> > the tag if both states contain the tag.
> > 
> > In other words, if two functions A and B both call function C (let's assume
> > these functions all have a single parameter) - then the range list for
> > function C's parameter will only contain the tag if the parameters for both
> > A and B are also tagged. Therefore we know that if if we make a comparison
> > between a tagged and untagged address in C then we know it doesn't need
> > to flag a detection as the values concerned have all come from functions
> > that are marked as __untagged.
> > 
> > If only function A was tagged, then we could raise a detection in C - and
> > rely on parameter tracing to walk up both callers of C to build call trees
> > and exclude anything and its parents that are tagged.
> > 
> > In practice we will always raise a detection and let the smdb.py script
> > figure out which call trees to display.
> 
> This approach works well, however it sometimes falls over due to gaps in
> the Smatch flow-tracking/parameter-tracking. E.g.
> 
> void func1(unsigned long __untagged addr, unsigned long len)
> {
>     unsigned long tmp = 1;
>     tmp += addr;
>     tmp = func2(tmp, len, 4);
>     find_vma(NULL, tmp);
> }
> 
> The parameter addr is annotated as untagged, therefore its range list may
> look something like '0-u64max[u]'.

Ideally, it would never be 0-u64max and [u] at the same time, right?
The [u] is already a work around because Smatch is parsing it badly.

> 
> The first line of code results in a range list of tmp to be '1';
> 
> The second line of code assigns addr to tmp, resulting in tmp's range list
> changing to '1-u64max[u]'
> 
> The third line of code passes tmp to a function and assigns the result back
> to tmp. Just like any other function Smatch keeps track of all the callers
> to func2, and when it parses the code in func2 it will consider 'tmp' to
> have a range list of all those callers combined - if there is a caller of
> func2 anywhere in the kernel that doesn't provide tmp with the '[u]' tag
> then the tag will be dropped. This results in the return value of func2 not
> containing a tag and resulting in 'tmp' inside func1 loosing it's '[u]' tag.

Yeah.  Gotcha.  We don't know if the return value is tagged untagged,
but maybe we could store something to say that it inherits the tag from
param $0.

> Sometimes Smatch is clever and can describe the range of values returned by a
> function in terms relative to the input arguments, in that case the tag would
> be preserved.

Yeah.

> 
> This means that find_vma, if it detects a tagged pointer will flag it seeing as
> 'tmp' does not contain the '[u]' tag. Fortunately, in this case we rely on
> smdb.py to walk the tree from the detection, therefore it will see that func1
> calls find_vma, and func1 'addr' is tagged and thus it can suppress the
> detection. Unfortunately this relies on the ability of Smatch to detect that
> 'tmp' as passed to find_vma is the 'addr' in the function arguments - this
> parameter tracking doesn't always work (for the same reasons as described). It is
> for these reasons that using __untagged annotations is hit or miss.

I feel like my temptation is always to do a hack around.  Like we could
say func2() never calls get_user() or copy_from_user(), it just takes
user data from the parameters and returns it.  Smatch already
differentiates these situations with USER_DATA and USER_DATA_SET so we
could say if a function returns USER_DATA set and only passed untagged
to the function then the return is also untagged.

Would that work?

regards,
dan carpenter




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux