Re: Tracking Implicit Dependencies

Dan Carpenter <dan.carpenter@xxxxxxxxxx> · Fri, 29 Sep 2017 23:50:28 +0300

Huh...  Sorry for not responding earlier.  I missed your email in my
other inbox.  I know that students are always on a deadline.  Please CC
me and I'll try to respond more quickly.

On Sat, Sep 23, 2017 at 01:37:45AM -0400, Andrew Zhu Aday wrote:
> Hi Smatch team,
> 
> My name is Andrew Aday and I'm an CS undergrad at Columbia University.
> 
> I'm currently doing research on kernel fuzzing via the system call interface,
> and I'm at the point now where I need to track "implicit dependencies" between
> system calls. I'm writing this email to ask: how appropriate is Smatch
> for this task?

It's very appropriate.  It's still probably a big project though...

> 
> To explain:
> 
> Syscall A is an "explicit dependency" of syscall B if A produces a
> resource which B
> uses. For example `open` and `read`
> 
> Syscall A is an "implicit dependency" of syscall B if A can affect the
> control flow/coverage of B, but B doesn't use A's return value. For example,
> `mlockall` and `msync`; calling `mlockall` before `msync` will cause
> the latter to
> fail with -EBUSY, and thus influences its control flow.

Hm...  That's tricky.

> 
> My naive approach:
> 
> Use static analysis to build out the CFG for each syscall, and create a mapping
> from each system call to the global variables it accesses. Mark two syscalls
> as implicitly dependent if the intersection of their global var
> accesses is nonempty.
> (After pruning the especially common ones e.g. GFP_KERNEL)
> 

GFP_KERNEL is a define so it's just a number.

> I've looked briefly at Smatch, but I wanted to get your assessment
> before I go any
> further. Is what I'm trying to do feasible? I see there's a `FUNCTION_CALL_HOOK`
> I can plug into to recursively collect all the global variables under
> a given syscall.
> But I'm very unfamiliar with static analysis in general, so I'm not
> sure about how
> straightforward doing this is.
> 
> Please let me know your thoughts! And of course ask me any questions or if I
> need to explain better.
> 
> p.s Why does the cross-function database become more accurate every
> time it's rebuilt?
> 

The cross function database just looks at frob() the value of "x" to
frob_two().  But since this is the first time the DB was built then
we don't know the value of "x".  The next time we build the DB we can
see how frob() is called so the information about "x" becomes more
filled out.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe smatch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html