Hi Dan, Thanks for the quick and helpful response. On Wed, May 29, 2019 at 10:49:33PM +0300, Dan Carpenter wrote: > On Wed, May 29, 2019 at 07:47:23PM +0100, Andrew Murray wrote: > > void func1(int x) > > { > > x++; > > func2(x); > > } > > > > void start(void *b) > > { > > void *a; > > copy_from_user(a, b, 10); > > func1(a[0]); > > } > > Unfortunately, Smatch is pretty crap at tracking array elements so it > doesn't track that a[0] is user controlled. > > Otherwise it generally does track that an int is controlled by the user > and the user can pick a number between 0-10 or whatever. Like maybe the > a=0-100 but only for in-kernel API and 0-10 can be set from the ioctl, > the two ranges are tracked separately. > > > > > In the above, smatch seems to be able to identify the callsite for func1 as > > having user data, however not for x in the x++ expressions. I guess smatch would > > need to track each use of 'a' and propogate a 'userspace' state for each assignment, > > e.g. to x. (Is smatch currently limited to tracking user data for pointers rather > > than base types?). > > x++ is problematic for Smatch, also. If we're outside of a loop then > it says 0-10 becomes 1-11, but if we're inside a loop then it becomes > 1-s32max. OK thanks for the explanation. > > Another problem is that Smatch doesn't understand how the sign_extend64() > function works so it doesn't understand the untagged_addr() macro. :/ > I see one bug here and a missing feature... Right now Smatch thinks > that the return value is totally unknown and not user controlled. This > is a fixable issue by implementing SPECIAL_LEFTSHIFT in rl_binop(). Actually the ability to treat the return type as not user controlled makes this a little simpler... We want to flag a potential issue when a tagged address is used in a function that wants untagged addresses (as marked by the address space annotation). Therefore we can simply test for "ctype.as == 5 && is_user_rl(expr)" to meet this condition. When a user correctly calls untagged_addr prior to calling these functions the macro helpfully drops the user controlled flag which stops the condition above triggering. Is there a way of getting smatch to drop the 'user controlled' flag without relying on a bug? > > Anyway, let me send you some updated test code without the array. Copy > it to the smatch/ directory and do: > > ./smatch --info -p=kernel test.c | tee warns.txt > ./smatch_data/db/create_db.sh -p=kernel warns.txt > ./smatch -p=kernel test.c > > That maybe gives you a better idea of where we're at. Thanks this was really helpful, I'm not sure I really understood how the database was created prior to this. > > I guess my other question would be where would you want to print the > warning? Thanks to your help I believe I have this working as follows: diff --git a/check_list.h b/check_list.h index b1d24c504ba5..f7551e7c5215 100644 --- a/check_list.h +++ b/check_list.h @@ -192,6 +192,7 @@ CK(check_nospec_barrier) CK(check_spectre) CK(check_spectre_second_half) CK(check_implicit_dependencies) +CK(check_tagged) /* wine specific stuff */ CK(check_wine_filehandles) diff --git a/check_tagged.c b/check_tagged.c new file mode 100644 index 000000000000..d14a81a6c33a --- /dev/null +++ b/check_tagged.c @@ -0,0 +1,49 @@ +#include "smatch.h" +#include "smatch_extra.h" + +static void untagged_check(struct expression *expr) +{ + char *name; + + if (parse_error) + return; + + if (is_impossible_path()) + return; + + if (expr->type == EXPR_PREOP) + return; + + if (is_user_rl(expr)) { + if (expr->symbol && expr->symbol->ctype.as == 5) { + name = expr_to_str(expr); + sm_warning("potential tagged issue '%s'", name); + free_string(name); + } + } +} + +void check_tagged(int id) +{ + if (option_project != PROJ_KERNEL) + return; + + add_hook(&untagged_check, EXPR_HOOK); +} This correctly identifies functions that have been annotated with the __untagged annotation that use data from userspace. However, it's not actually that useful... For example, if I annotate the find_vma function, then it will flag this as a potential tagged issue given that its called (from at least some contexts) with userspace data. However find_vma is called all over the kernel, and so it's difficult to figure out which caller called find_vma with user data. It's possible to use ./smatch_scripts/unlocked_paths.pl with a garbage lock and 'find_vma' target to look for all the callers where this is a leaf function, but this doesn't track the call parameters to exclude paths where the leaf annotated function contains user data. I haven't fully understood var_user_rl - but it looks like functions such as match_user_assign_function propogate the user_data tag on each run to update state, and thus it's probably not easy to get the original source of user data after is_user_rl determines its user. Is that correct? I would be great if there was a way to obtain the function that provided the original source of user data - then it would be possible to write a script that shows the path (call chain) between the two functions - I hoped smatch_scripts/ call_tree.pl could do this but it doesn't seem to do anything for me. Thanks, Andrew Murray > > regards, > dan carpenter > -------------- > > #include "check_debug.h" > > int copy_from_user(void *dest, void *src, long size); > > #define __s64 signed long long > #define __u64 unsigned long long > #define u64 unsigned long long > #define __u8 unsigned char > > /** > * sign_extend64 - sign extend a 64-bit value using specified bit as sign-bit > * @value: value to sign extend > * @index: 0 based bit index (0<=index<64) to sign bit > */ > static inline __s64 sign_extend64(__u64 value, int index) > { > __u8 shift = 63 - index; > return (__s64)(value << shift) >> shift; > } > > #define untagged_addr(addr) ((__typeof__(addr))sign_extend64((u64)(addr), 55)) > > void func1(unsigned long x) > { > __smatch_user_rl(x); It took more than a glance to realise what this was doing! > } > > void *p; > int main(void) > { > unsigned long a; > > copy_from_user(&a, p, sizeof(a)); > __smatch_user_rl(a); > __smatch_user_rl(untagged_addr(a)); > func1(a); > func1(untagged_addr(a)); > > return 0; > } > Thanks, Andrew Murray