Re: [PATCH 0/5] mm/kasan: advanced check

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Tue, 21 Nov 2017 11:17:36 -0800

On 2017/11/21 1:54, Dmitry Vyukov wrote:
On Mon, Nov 20, 2017 at 9:29 PM, Wengang <wen.gang.wang@xxxxxxxxxx> wrote:

On 11/20/2017 12:20 PM, Dmitry Vyukov wrote:
On Mon, Nov 20, 2017 at 9:05 PM, Wengang <wen.gang.wang@xxxxxxxxxx> wrote:

On 11/20/2017 12:41 AM, Dmitry Vyukov wrote:

The reason I didn't submit the vchecker to mainline is that I didn't
find
the case that this tool is useful in real life. Most of the system
broken
case
can be debugged by other ways. Do you see the real case that this tool
is
helpful?
Hi,

Yes, this is the main question here.
How is it going to be used in real life? How widely?

I think the owner check can be enabled in the cases where KASAN is used.
--
That is that we found there is memory issue, but don't know how it
happened.

But KASAN generally pinpoints the corruption as it happens. Why do we
need something else?

Currently (without this patch set) kasan can't detect the overwritten issues
that happen on allocated memory.

Say, A allocated a 128 bytes memory and B write to that memory at offset 0
with length 100 unexpectedly.  Currently kasan won't report error for any
writing to the offset 0 with len <= 128 including the B writting.  This
patch lets kasan report the B writing to offset 0 with length 100.

So this will be used for manual debugging and you don't have plans to
annotate kernel code with additional tags, right?
I am not sure what do you mean by "manual debugging". What is needed to 
use the owner check is:
The memory user needs to do:
1)  code change: register the checker with the allowed functions
2)  code change: bind the memory to the checker
3)  recompile the kernel
4)  run with the recompiled kernel and reproduce the issue

By "additional tags", if you meant "add some explanation comment", I 
think one can refer to the commit message about the code change;
if you meant "additional kernel config item to enable/disable code", I 
have no such plan.  If no "owner checker" is registered, it just acts 
like the basic kasan (without this patch) with almost same performance. 
Even with "owner checker" registered,  and memories are bound to the 
checker,  it's still the rare case to do the owner check. So the 
overheard caused by owner check is slight. I don't find the reason we 
need an additional kernel config.

If this meant to be used by kernel developers during debugging, this
feature needs to be documented in Documentation/dev-tools/kasan.rst
including an example. It's hard to spread knowledge about such
features especially if there are no mentions in docs. Documentation
can then be quickly referenced e.g. as a suggestion of how to tackle a
particular bug.
Yes, this is a good idea. I was/am thinking so.

General comments:

1. The check must not affect fast-path. I think we need to move it
into kasan_report (and then rename kasan_report to something else).
Closer to what Joonsoo did in his version, but move then check even
further. This will also make inline instrumentation work because it
calls kasan_report, then kasan_report will do the additional check and
potentially return without actually reporting a bug.
The idea is that the check reserves some range of bad values in shadow
and poison the object with that special value. Then all accesses to
the protected memory will be detected as bad and go into kasan_report.
Then kasan_report will do the additional check and potentially return
without reporting.
This has 0 overhead when the feature is not used, enables inline
instrumentation and is less intrusive.
The owner check can be moved to kasan_report() by letting the poison check
routine return "possible violation" when the memory is bound to a owner
and then kasan_report() will get the chance to do further (owner) check.

Well I wonder how that moving would benefit.
If the purpose is to remove overhead,   the moving didn't remove of any 
run of
owner check. It would just move it to a different place and it will run 
just a bit later.
I think even current implementation, it has almost 0 overhead when no 
memory is
bound to owners.  The owner check is performed only when the memory is 
bound (the
bound check is light), if memory is not bound, no owner check is performed.
I am predicting the code that has owner check routine moved to 
kasan_report(), it
should be like this:
(fake code)
in poison check routines:
       ...
       after all case that returns "Yes",
       if bound check returns true (memory is bound):    --> bound 
check is here
             return "possible"
       ...
in the caller of poison check routines:
       ...
       if poison check routine returns "yes" or "possible":
             calls kasan_report()

in kasan_report():
       ....
       if no basic violation found:
           run owner 
check                                                  --> owner check 
is here
       ...

Current code is like this:
in poison check routines:
       ...
       after all case that returns "yes",
       if bound check returns true (memory is bound):  --> bound check 
is here
               run owner 
check                                            --> owner check is here

Comparing to current implementation,
anyway the "bound check" is done either in the poison check routines or 
in kasan_report().
anyway the "owner check" is done either in the poison check routines or 
in kasan_report().
I don't see we have reduced number of calls of "bound check" and/or 
"owner check".
Can you pinpoint which part will be reduced?

If the purpose is to make inline instrumentation work for owner check, 
it interests
me!  This implementation only works fine in outline instrumentation and 
seems the
poison checks are not called at all with inline compile type. Could you 
share more on this?

The badness of moving owner check to kasan_report() is that it breaks 
the function
clearness in the code.  From this point of view, check is just check, it 
should say "yes" or
"no", not "possible";  report is just report, no checks should be 
performed in report.

2. Moving this to a separate .c/.h files sounds like a good idea.
kasan.c is a bit of a mess already. If we do (1), changes to kasan.c
will be minimal. Again closer to what Joonsoo did.
If the owner checks would remain in the poison check routines, it would 
be in kasan.c.
If we have enough points to support the moving, say that makes inline 
instrumentation
work, it can be in a separated .c/.h and yes that would be better then.

3. We need to rename it from "advanced" to something else (owner
check?). Features must be named based on what they do, rather then how
advanced they are. If we add other complex checks, how should we name
them? even_more_advanced?

LoL,  No and Yes.
The feature I am adding is "owner check" and I define it as one of the 
"advanced check",
By looking at the patch its self (especially enum kasan_adv_chk_type in 
patch 4/5)  , you
can see, I was leaving spaces for other kind of "advanced checks". And 
(future) different
"advanced checks" can be added -- say "old value validation", "new value 
validation"
 -- though the new value is not  supported by compiler yet.  But yes 
the name "advanced"
is really not what I want, but I failed to find an accurate one. How do 
you think?

I am fine with adding such feature provided that it does not affect
performance/memory consumption if not used, works with inline
instrumentation and is separated into separate files. But it also
needs to be advertised somehow among kernel developers, otherwise only
you guys will use it.
So far it should has almost same performance if feature is not used; 
definitely
no more memory consumption.  Now it doesn't work with inline 
instrumentation,
could you share more information on how to make it also work with inline 
mode?
It technically can be moved to separated files. I will add the doc.

thanks,
wengang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>