Hello, I'm investigating the use of smatch to identify user provided data in specific kernel functions, however I'm having difficulty in making this work correctly. ARM64 provides a feature where the top byte of a pointer is ignored by the hardware and as a result userspace can use this byte for its own purpose. The ABI is being changed meaning that userspace is permitted to pass pointers to the kernel with the top byte being non-zero. These pointers get into the kernel via a variety of ioctls and syscalls and are usually passed around as unsigned longs. However, despite the hardware ignoring the top byte, many memory related kernel functions don't handle this and so the top byte must be untagged/stripped prior to entering these functions (e.g. [1]). I'd like to make use of smatch to identify code that doesn't untag memory addresses where needed. My initial approach was to use sparse, and to annotate functions that required an untagged address as follows: --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -19,6 +19,7 @@ # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) # define __percpu __attribute__((noderef, address_space(3))) # define __rcu __attribute__((noderef, address_space(4))) +# define __untagged __attribute__((address_space(5))) # define __private __attribute__((noderef)) extern void __chk_user_ptr(const volatile void __user *); extern void __chk_io_ptr(const volatile void __iomem *); --- a/evaluate.c +++ b/evaluate.c @@ -1470,6 +1470,14 @@ static int compatible_assignment_types(struct expression *expr, struct symbol *t return 0; } + if (target->ctype.as == 5 && source->ctype.as != 5) { + warning(expr->pos, "Tag mismatch in %s (%d)", where, expr->pos.line); + info(expr->pos, " asn = %d", source->ctype.as); + info(expr->pos, " asn = %d", target->ctype.as); + *rp = cast_to(*rp, target); + return 0; + } + return 1; } This hack works and correctly identifies code that isn't using an __untagged annotated variable when it should. (A untagged_addr macro is used to strip the tag and add the __untagged annotation to the returned unsigned longs). However, it results in the propogation of __untagged annotations across the kernel up the call chain - for example for the find_vma function: every kernel function which calls a function which calls find_vma will have this annotation - this includes functions that are no where near user-space (e.g. core memory management code). I then attempted to use smatch, this seems appropiate as there are existing check_xxx files which do similar tasks, e.g. check_spectre, check_uninitialized, etc. My approach is to only annotate the functions that actually need the address pointer to be untagged (rather than that, its caller, and their caller, etc) and then rely on the smatch flow to identify if the address pointer is influenced by userspace or not. I first attempted this by hooking into expressions that use annotated symbols (annotated by the function prototype arguments) and test to see if the expression includes data provided by user. I seem to fail when testing for user provided data - can smatch track user provided data that are base types (e.g. unsigned long)? E.g. void func2(int x) { x++; } void func1(int x) { x++; func2(x); } void start(void *b) { void *a; copy_from_user(a, b, 10); func1(a[0]); } In the above, smatch seems to be able to identify the callsite for func1 as having user data, however not for x in the x++ expressions. I guess smatch would need to track each use of 'a' and propogate a 'userspace' state for each assignment, e.g. to x. (Is smatch currently limited to tracking user data for pointers rather than base types?). Another approach could be to hook into expressions that use annotated symbols - but only for functions where there is a syscall or ioctl in any of the possible call stacks - is this possible? I'm very new to smatch/sparse so I would be grateful for any pointers (excuse the pun) on how best to approach this. Also let me know if there is a more appropriate place for this discussion. FYI - An earlier discussion related to this can be found here [2]. [1] https://patchwork.kernel.org/patch/9691093/ [2] https://patchwork.kernel.org/patch/10581535/ Thanks, Andrew Murray