Powered by Linux
Detecting user data on base types — Semantic Matching Tool

Detecting user data on base types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm investigating the use of smatch to identify user provided data in specific
kernel functions, however I'm having difficulty in making this work correctly.

ARM64 provides a feature where the top byte of a pointer is ignored by the
hardware and as a result userspace can use this byte for its own purpose. The
ABI is being changed meaning that userspace is permitted to pass pointers to
the kernel with the top byte being non-zero. These pointers get into the kernel
via a variety of ioctls and syscalls and are usually passed around as unsigned
longs. However, despite the hardware ignoring the top byte, many memory related
kernel functions don't handle this and so the top byte must be untagged/stripped
prior to entering these functions (e.g. [1]).

I'd like to make use of smatch to identify code that doesn't untag memory
addresses where needed. My initial approach was to use sparse, and to annotate
functions that required an untagged address as follows:

--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -19,6 +19,7 @@
 # define __cond_lock(x,c)      ((c) ? ({ __acquire(x); 1; }) : 0)
 # define __percpu      __attribute__((noderef, address_space(3)))
 # define __rcu         __attribute__((noderef, address_space(4)))
+# define __untagged    __attribute__((address_space(5)))
 # define __private     __attribute__((noderef))
 extern void __chk_user_ptr(const volatile void __user *);
 extern void __chk_io_ptr(const volatile void __iomem *);

--- a/evaluate.c
+++ b/evaluate.c
@@ -1470,6 +1470,14 @@ static int compatible_assignment_types(struct expression *expr, struct symbol *t
                return 0;
        }
 
+       if (target->ctype.as == 5 && source->ctype.as != 5) {
+               warning(expr->pos, "Tag mismatch in %s (%d)", where, expr->pos.line);
+               info(expr->pos, "   asn = %d", source->ctype.as);
+               info(expr->pos, "   asn = %d", target->ctype.as);
+               *rp = cast_to(*rp, target);
+               return 0;               
+       }
+
        return 1;
 }
 
This hack works and correctly identifies code that isn't using an __untagged
annotated variable when it should. (A untagged_addr macro is used to strip the
tag and add the __untagged annotation to the returned unsigned longs).  However,
it results in the propogation of __untagged annotations across the kernel up the
call chain - for example for the find_vma function: every kernel function which
calls a function which calls find_vma will have this annotation - this includes
functions that are no where near user-space (e.g. core memory management code).

I then attempted to use smatch, this seems appropiate as there are existing
check_xxx files which do similar tasks, e.g. check_spectre, check_uninitialized,
etc. My approach is to only annotate the functions that actually need the address
pointer to be untagged (rather than that, its caller, and their caller, etc) and
then rely on the smatch flow to identify if the address pointer is influenced by
userspace or not.

I first attempted this by hooking into expressions that use annotated symbols
(annotated by the function prototype arguments) and test to see if the expression
includes data provided by user. I seem to fail when testing for user provided
data - can smatch track user provided data that are base types (e.g. unsigned
long)? E.g.

void func2(int x)
{
	x++;
}

void func1(int x)
{
	x++;
	func2(x);
}

void start(void *b)
{
	void *a;
	copy_from_user(a, b, 10);
	func1(a[0]);
}

In the above, smatch seems to be able to identify the callsite for func1 as
having user data, however not for x in the x++ expressions. I guess smatch would
need to track each use of 'a' and propogate a 'userspace' state for each assignment,
e.g. to x. (Is smatch currently limited to tracking user data for pointers rather
than base types?).

Another approach could be to hook into expressions that use annotated symbols - 
but only for functions where there is a syscall or ioctl in any of the possible
call stacks - is this possible?

I'm very new to smatch/sparse so I would be grateful for any pointers (excuse the
pun) on how best to approach this. Also let me know if there is a more appropriate
place for this discussion.

FYI - An earlier discussion related to this can be found here [2].

[1] https://patchwork.kernel.org/patch/9691093/
[2] https://patchwork.kernel.org/patch/10581535/

Thanks,

Andrew Murray



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux