Powered by Linux
Re: Detecting user data on base types — Semantic Matching Tool

Re: Detecting user data on base types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

Thanks for the quick and helpful response.

On Wed, May 29, 2019 at 10:49:33PM +0300, Dan Carpenter wrote:
> On Wed, May 29, 2019 at 07:47:23PM +0100, Andrew Murray wrote:
> > void func1(int x)
> > {
> > 	x++;
> > 	func2(x);
> > }
> > 
> > void start(void *b)
> > {
> > 	void *a;
> > 	copy_from_user(a, b, 10);
> > 	func1(a[0]);
> > }
> 
> Unfortunately, Smatch is pretty crap at tracking array elements so it
> doesn't track that a[0] is user controlled.
> 
> Otherwise it generally does track that an int is controlled by the user
> and the user can pick a number between 0-10 or whatever.  Like maybe the
> a=0-100 but only for in-kernel API and 0-10 can be set from the ioctl,
> the two ranges are tracked separately.
> 
> > 
> > In the above, smatch seems to be able to identify the callsite for func1 as
> > having user data, however not for x in the x++ expressions. I guess smatch would
> > need to track each use of 'a' and propogate a 'userspace' state for each assignment,
> > e.g. to x. (Is smatch currently limited to tracking user data for pointers rather
> > than base types?).
> 
> x++ is problematic for Smatch, also.  If we're outside of a loop then
> it says 0-10 becomes 1-11, but if we're inside a loop then it becomes
> 1-s32max.

OK thanks for the explanation.

> 
> Another problem is that Smatch doesn't understand how the sign_extend64()
> function works so it doesn't understand the untagged_addr() macro.  :/
> I see one bug here and a missing feature...  Right now Smatch thinks
> that the return value is totally unknown and not user controlled.  This
> is a fixable issue by implementing SPECIAL_LEFTSHIFT in rl_binop().

Actually the ability to treat the return type as not user controlled makes
this a little simpler...

We want to flag a potential issue when a tagged address is used in a function
that wants untagged addresses (as marked by the address space annotation).
Therefore we can simply test for "ctype.as == 5 && is_user_rl(expr)" to meet
this condition.

When a user correctly calls untagged_addr prior to calling these functions
the macro helpfully drops the user controlled flag which stops the condition
above triggering.

Is there a way of getting smatch to drop the 'user controlled' flag without
relying on a bug?

> 
> Anyway, let me send you some updated test code without the array.  Copy
> it to the smatch/ directory and do:
> 
> ./smatch --info -p=kernel test.c | tee warns.txt
> ./smatch_data/db/create_db.sh -p=kernel warns.txt
> ./smatch -p=kernel test.c
> 
> That maybe gives you a better idea of where we're at.

Thanks this was really helpful, I'm not sure I really understood how the
database was created prior to this.

> 
> I guess my other question would be where would you want to print the
> warning?

Thanks to your help I believe I have this working as follows:

diff --git a/check_list.h b/check_list.h
index b1d24c504ba5..f7551e7c5215 100644
--- a/check_list.h
+++ b/check_list.h
@@ -192,6 +192,7 @@ CK(check_nospec_barrier)
 CK(check_spectre)
 CK(check_spectre_second_half)
 CK(check_implicit_dependencies)
+CK(check_tagged)
 
 /* wine specific stuff */
 CK(check_wine_filehandles)
diff --git a/check_tagged.c b/check_tagged.c
new file mode 100644
index 000000000000..d14a81a6c33a
--- /dev/null
+++ b/check_tagged.c
@@ -0,0 +1,49 @@
+#include "smatch.h"
+#include "smatch_extra.h"
+
+static void untagged_check(struct expression *expr)
+{
+       char *name;
+
+       if (parse_error)
+               return;
+
+       if (is_impossible_path())
+               return;
+
+       if (expr->type == EXPR_PREOP)
+               return;
+
+       if (is_user_rl(expr)) {
+               if (expr->symbol && expr->symbol->ctype.as == 5) {
+                       name = expr_to_str(expr);
+                       sm_warning("potential tagged issue '%s'", name);
+                       free_string(name);
+               }
+       }
+}
+
+void check_tagged(int id)
+{
+       if (option_project != PROJ_KERNEL)
+               return;
+
+       add_hook(&untagged_check, EXPR_HOOK);
+}
 
This correctly identifies functions that have been annotated with the __untagged
annotation that use data from userspace. However, it's not actually that
useful...

For example, if I annotate the find_vma function, then it will flag this as a
potential tagged issue given that its called (from at least some contexts) with
userspace data. However find_vma is called all over the kernel, and so it's
difficult to figure out which caller called find_vma with user data. 

It's possible to use ./smatch_scripts/unlocked_paths.pl with a garbage lock and
'find_vma' target to look for all the callers where this is a leaf function, but
this doesn't track the call parameters to exclude paths where the leaf annotated
function contains user data.

I haven't fully understood var_user_rl - but it looks like functions such as
match_user_assign_function propogate the user_data tag on each run to update
state, and thus it's probably not easy to get the original source of user
data after is_user_rl determines its user. Is that correct?

I would be great if there was a way to obtain the function that provided the
original source of user data - then it would be possible to write a script that
shows the path (call chain) between the two functions - I hoped smatch_scripts/
call_tree.pl could do this but it doesn't seem to do anything for me.

Thanks,

Andrew Murray

> 
> regards,
> dan carpenter
> --------------
> 
> #include "check_debug.h"
> 
> int copy_from_user(void *dest, void *src, long size);
> 
> #define __s64 signed long long
> #define __u64 unsigned long long
> #define u64 unsigned long long
> #define __u8 unsigned char
> 
> /**
>  * sign_extend64 - sign extend a 64-bit value using specified bit as sign-bit
>  * @value: value to sign extend
>  * @index: 0 based bit index (0<=index<64) to sign bit
>  */
> static inline __s64 sign_extend64(__u64 value, int index)
> {
> 	__u8 shift = 63 - index;
> 	return (__s64)(value << shift) >> shift;
> }
> 
> #define untagged_addr(addr) ((__typeof__(addr))sign_extend64((u64)(addr), 55))
> 
> void func1(unsigned long x)
> {
> 	__smatch_user_rl(x);

It took more than a glance to realise what this was doing!

> }
> 
> void *p;
> int main(void)
> {
> 	unsigned long a;
> 
> 	copy_from_user(&a, p, sizeof(a));
> 	__smatch_user_rl(a);
> 	__smatch_user_rl(untagged_addr(a));
> 	func1(a);
> 	func1(untagged_addr(a));
> 
> 	return 0;
> }
> 

Thanks,

Andrew Murray



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux