On Thu, May 30, 2019 at 10:03:30AM +0100, Andrew Murray wrote: > > Another problem is that Smatch doesn't understand how the sign_extend64() > > function works so it doesn't understand the untagged_addr() macro. :/ > > I see one bug here and a missing feature... Right now Smatch thinks > > that the return value is totally unknown and not user controlled. This > > is a fixable issue by implementing SPECIAL_LEFTSHIFT in rl_binop(). > > Actually the ability to treat the return type as not user controlled makes > this a little simpler... > > We want to flag a potential issue when a tagged address is used in a function > that wants untagged addresses (as marked by the address space annotation). > Therefore we can simply test for "ctype.as == 5 && is_user_rl(expr)" to meet > this condition. > > When a user correctly calls untagged_addr prior to calling these functions > the macro helpfully drops the user controlled flag which stops the condition > above triggering. > > Is there a way of getting smatch to drop the 'user controlled' flag without > relying on a bug? We know the high byte is either 0 or 0xff so we could do something like: struct symbol *type; sval_t invalid = { .type = &ullong_ctype, .vaule = 1ULL << 62 }; type = get_type(expr); if (!type || type_bits(type) != 64) return; if (!get_user_rl(expr, &rl)) return; if (rl_has_sval(rl, invalid)) sm_warning("potential tagged issue '%s'", name); I will fix the untagged_addr() handling very soon. > Thanks to your help I believe I have this working as follows: > > diff --git a/check_list.h b/check_list.h > index b1d24c504ba5..f7551e7c5215 100644 > --- a/check_list.h > +++ b/check_list.h > @@ -192,6 +192,7 @@ CK(check_nospec_barrier) > CK(check_spectre) > CK(check_spectre_second_half) > CK(check_implicit_dependencies) > +CK(check_tagged) > > /* wine specific stuff */ > CK(check_wine_filehandles) > diff --git a/check_tagged.c b/check_tagged.c > new file mode 100644 > index 000000000000..d14a81a6c33a > --- /dev/null > +++ b/check_tagged.c > @@ -0,0 +1,49 @@ > +#include "smatch.h" > +#include "smatch_extra.h" > + > +static void untagged_check(struct expression *expr) > +{ > + char *name; > + > + if (parse_error) > + return; > + > + if (is_impossible_path()) > + return; > + > + if (expr->type == EXPR_PREOP) > + return; > + > + if (is_user_rl(expr)) { > + if (expr->symbol && expr->symbol->ctype.as == 5) { > + name = expr_to_str(expr); > + sm_warning("potential tagged issue '%s'", name); > + free_string(name); > + } > + } > +} > + > +void check_tagged(int id) > +{ > + if (option_project != PROJ_KERNEL) > + return; > + > + add_hook(&untagged_check, EXPR_HOOK); > +} > > This correctly identifies functions that have been annotated with the __untagged > annotation that use data from userspace. However, it's not actually that > useful... > > For example, if I annotate the find_vma function, then it will flag this as a > potential tagged issue given that its called (from at least some contexts) with > userspace data. However find_vma is called all over the kernel, and so it's > difficult to figure out which caller called find_vma with user data. > > It's possible to use ./smatch_scripts/unlocked_paths.pl with a garbage lock and > 'find_vma' target to look for all the callers where this is a leaf function, but > this doesn't track the call parameters to exclude paths where the leaf annotated > function contains user data. Huh... I haven't thought about that script in years... :/ What I do is I download the latest linux-next ever day and rebuild my DB every day. Each time you rebuild it, the call tree gets filled out a bit. After about a week then the DB is as complete as it is going to get. Then I use the smatch_data/db/smdb.py script to figure out the warnings. I should add an option so that it only shows callers which pass user data. Each call site has a unique caller ID. > I haven't fully understood var_user_rl - but it looks like functions such as > match_user_assign_function propogate the user_data tag on each run to update > state, and thus it's probably not easy to get the original source of user > data after is_user_rl determines its user. Is that correct? Use the get_user_rl() function instead. The var_user_rl() is only supposed to be used internally, to do math. > > I would be great if there was a way to obtain the function that provided the > original source of user data - then it would be possible to write a script that > shows the path (call chain) between the two functions - I hoped smatch_scripts/ > call_tree.pl could do this but it doesn't seem to do anything for me. That's the smatch_data/db/smdb.py script. I have it as an alias in vim so I can do CTRL-c to see how a function is called or CTRL-r to see what it returns # map <C-r> :! vim_smdb return_states <cword> <CR> :execute 'edit' system("cat ~/.smdb_tmp/cur") <CR> # map <C-c> :! vim_smdb <cword> <CR> :execute 'edit' system("cat ~/.smdb_tmp/cur") <CR> regards, dan carpenter