On Mon, Oct 07, 2019 at 04:55:44PM +0100, Andrew Murray wrote: > On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote: > > The kernel provides a number of annotations that can be used by smatch > > for static analysis, for example __user and __kernel - the kernel relies > > on the GCC attribute 'address_space(x)' to implement these annotations. > > By adding a new annotation named __untagged with address space 5 we can > > use this annotation to indicate function parameters which should not > > be treated as tagged addresses. > > > > Let's give Smatch an understanding of this annotation, when used by > > function parameters, by adding a '[u]' tag to the end of their user value > > ranges. We ensure that upon merges of range lists states, we only preserve > > the tag if both states contain the tag. > > > > In other words, if two functions A and B both call function C (let's assume > > these functions all have a single parameter) - then the range list for > > function C's parameter will only contain the tag if the parameters for both > > A and B are also tagged. Therefore we know that if if we make a comparison > > between a tagged and untagged address in C then we know it doesn't need > > to flag a detection as the values concerned have all come from functions > > that are marked as __untagged. > > > > If only function A was tagged, then we could raise a detection in C - and > > rely on parameter tracing to walk up both callers of C to build call trees > > and exclude anything and its parents that are tagged. > > > > In practice we will always raise a detection and let the smdb.py script > > figure out which call trees to display. > > This approach works well, however it sometimes falls over due to gaps in > the Smatch flow-tracking/parameter-tracking. E.g. > > void func1(unsigned long __untagged addr, unsigned long len) > { > unsigned long tmp = 1; > tmp += addr; > tmp = func2(tmp, len, 4); > find_vma(NULL, tmp); > } > > The parameter addr is annotated as untagged, therefore its range list may > look something like '0-u64max[u]'. Ideally, it would never be 0-u64max and [u] at the same time, right? The [u] is already a work around because Smatch is parsing it badly. > > The first line of code results in a range list of tmp to be '1'; > > The second line of code assigns addr to tmp, resulting in tmp's range list > changing to '1-u64max[u]' > > The third line of code passes tmp to a function and assigns the result back > to tmp. Just like any other function Smatch keeps track of all the callers > to func2, and when it parses the code in func2 it will consider 'tmp' to > have a range list of all those callers combined - if there is a caller of > func2 anywhere in the kernel that doesn't provide tmp with the '[u]' tag > then the tag will be dropped. This results in the return value of func2 not > containing a tag and resulting in 'tmp' inside func1 loosing it's '[u]' tag. Yeah. Gotcha. We don't know if the return value is tagged untagged, but maybe we could store something to say that it inherits the tag from param $0. > Sometimes Smatch is clever and can describe the range of values returned by a > function in terms relative to the input arguments, in that case the tag would > be preserved. Yeah. > > This means that find_vma, if it detects a tagged pointer will flag it seeing as > 'tmp' does not contain the '[u]' tag. Fortunately, in this case we rely on > smdb.py to walk the tree from the detection, therefore it will see that func1 > calls find_vma, and func1 'addr' is tagged and thus it can suppress the > detection. Unfortunately this relies on the ability of Smatch to detect that > 'tmp' as passed to find_vma is the 'addr' in the function arguments - this > parameter tracking doesn't always work (for the same reasons as described). It is > for these reasons that using __untagged annotations is hit or miss. I feel like my temptation is always to do a hack around. Like we could say func2() never calls get_user() or copy_from_user(), it just takes user data from the parameters and returns it. Smatch already differentiates these situations with USER_DATA and USER_DATA_SET so we could say if a function returns USER_DATA set and only passed untagged to the function then the return is also untagged. Would that work? regards, dan carpenter