On 2017/11/22 11:29, Wengang Wang wrote:
On 2017/11/22 4:04, Andrey Ryabinin wrote:
On 11/18/2017 01:30 AM, Wengang Wang wrote:
Kasan advanced check, I'm going to add this feature.
Currently Kasan provide the detection of use-after-free and
out-of-bounds
problems. It is not able to find the overwrite-on-allocated-memory
issue.
We sometimes hit this kind of issue: We have a messed up structure
(usually dynamially allocated), some of the fields in the structure
were
overwritten with unreasaonable values. And kernel may panic due to
those
overeritten values. We know those fields were overwritten somehow,
but we
have no easy way to find out which path did the overwritten. The
advanced
check wants to help in this scenario.
The idea is to define the memory owner. When write accesses come from
non-owner, error should be reported. Normally the write accesses on
a given
structure happen in only several or a dozen of functions if the
structure
is not that complicated. We call those functions "allowed functions".
The work of defining the owner and binding memory to owner is
expected to
be done by the memory consumer. In the above case, memory consume
register
the owner as the functions which have write accesses to the
structure then
bind all the structures to the owner. Then kasan will do the "owner
check"
after the basic checks.
As implementation, kasan provides a API to it's user to register their
allowed functions. The API returns a token to users. At run time,
users
bind the memory ranges they are interested in to the check they
registered.
Kasan then checks the bound memory ranges with the allowed functions.
NAK. We don't add APIs with no users in the kernel.
If nothing in the kernel uses this API than there is no way to tell
if this works or not.
If the concern is just if this works or not. I think the last patch in
the set is a user of owner check. It shows the owner check works well.
A copy of one report is like this:
2134 [ 448.477923]
==================================================================
2135 [ 448.565140] BUG: KASAN: Non-owner write access violation in
funcB+0xd/0x3d [test_kasan]
2136 [ 448.661699] Write of size 1 at addr ffff881fa516e90c by task
insmod/5606
2137 [ 448.742314]
2138 [ 448.760514] CPU: 3 PID: 5606 Comm: insmod Tainted: G B OE
4.14.0-rc8 #9
2139 [ 448.760517] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
2140 [ 448.760519] Call Trace:
2141 [ 448.760529] dump_stack+0x63/0x8d
2142 [ 448.760538] print_address_description+0x7c/0x290
2143 [ 448.760547] kasan_report+0x274/0x3d0
2144 [ 448.760554] ? kasan_kmalloc+0xad/0xe0
2145 [ 448.760566] ? funcB+0xd/0x3d [test_kasan]
2146 [ 448.760578] ? kasan_adv+0x1f3/0x1f3 [test_kasan]
2147 [ 448.760585] __asan_store1+0xa4/0xb0
2148 [ 448.760597] ? funcB+0xd/0x3d [test_kasan]
2149 [ 448.760608] funcB+0xd/0x3d [test_kasan]
2150 [ 448.760620] kasan_adv+0x12a/0x1f3 [test_kasan]
2151 [ 448.760633] ? copy_user_test+0x1ba/0x1ba [test_kasan]
2152 [ 448.760641] ? percpu_counter_add_batch+0x22/0xa0
2153 [ 448.760646] ? 0xffffffffa0dd0000
2154 [ 448.760657] ? funcA+0x20/0x20 [test_kasan]
2155 [ 448.760667] ? do_munmap+0x52e/0x6a0
2156 [ 448.760675] ? vm_munmap+0xd8/0x110
2157 [ 448.760684] ? kasan_slab_free+0x89/0xc0
2158 [ 448.760690] ? kfree+0x95/0x190
2159 [ 448.760702] ? kasan_adv+0x1f3/0x1f3 [test_kasan]
2160 [ 448.760714] ? copy_user_test+0x1b3/0x1ba [test_kasan]
2161 [ 448.760726] kmalloc_tests_init+0x84/0xf89 [test_kasan]
2162 [ 448.760733] do_one_initcall+0xa6/0x210
2163 [ 448.760740] ? initcall_blacklisted+0x150/0x150
2164 [ 448.760748] ? kasan_unpoison_shadow+0x36/0x50
2165 [ 448.760755] ? kasan_kmalloc+0xad/0xe0
2166 [ 448.760762] ? kasan_unpoison_shadow+0x36/0x50
2167 [ 448.760770] ? __asan_register_globals+0x87/0xa0
2168 [ 448.760779] do_init_module+0xf4/0x312
2169 [ 448.760786] load_module+0x283a/0x3120
2170 [ 448.760802] ? layout_and_allocate+0x18b0/0x18b0
2171 [ 448.760809] ? vmap_page_range_noflush+0x2e3/0x400
2172 [ 448.760821] SYSC_init_module+0x1c3/0x1e0
2173 [ 448.760826] ? SYSC_init_module+0x1c3/0x1e0
2174 [ 448.760831] ? load_module+0x3120/0x3120
2175 [ 448.760839] ? SYSC_finit_module+0x1a0/0x1a0
2176 [ 448.760845] SyS_init_module+0xe/0x10
2177 [ 448.760851] do_syscall_64+0xe3/0x270
2178 [ 448.760860] entry_SYSCALL64_slow_path+0x25/0x25
2179 [ 448.760865] RIP: 0033:0x35f80e923a
2180 [ 448.760868] RSP: 002b:00007ffc8835e9a8 EFLAGS: 00000202
ORIG_RAX: 00000000000000af
2181 [ 448.760875] RAX: ffffffffffffffda RBX: 00007ffc8835f4ff RCX:
00000035f80e923a
2182 [ 448.760879] RDX: 00000000016d3010 RSI: 0000000000044f78 RDI:
00007fedf04b9010
2183 [ 448.760883] RBP: 00000000016d3010 R08: 0000000000081000 R09:
0000000000041000
2184 [ 448.760886] R10: 00000035f80db710 R11: 0000000000000202 R12:
0000000000044f78
2185 [ 448.760890] R13: 0000000000080000 R14: 00007fedf04b9010 R15:
0000000000000003
2186 [ 448.760894]
2187 [ 448.779081] Allocated by task 5606:
2188 [ 448.821206] save_stack_trace+0x1b/0x20
2189 [ 448.821212] save_stack+0x46/0xd0
2190 [ 448.821218] kasan_kmalloc+0xad/0xe0
2191 [ 448.821224] kmem_cache_alloc_trace+0xf0/0x1e0
2192 [ 448.821235] kasan_adv+0xe1/0x1f3 [test_kasan]
2193 [ 448.821246] kmalloc_tests_init+0x84/0xf89 [test_kasan]
2194 [ 448.821252] do_one_initcall+0xa6/0x210
2195 [ 448.821256] do_init_module+0xf4/0x312
2196 [ 448.821261] load_module+0x283a/0x3120
2197 [ 448.821265] SYSC_init_module+0x1c3/0x1e0
2198 [ 448.821269] SyS_init_module+0xe/0x10
2199 [ 448.821275] do_syscall_64+0xe3/0x270
2200 [ 448.821282] return_from_SYSCALL_64+0x0/0x6a
2201 [ 448.821284]
2202 [ 448.839471] Freed by task 0:
2203 [ 448.874305] (stack is not available)
2204 [ 448.917458]
2205 [ 448.935655] The buggy address belongs to the object at
ffff881fa516e900
2206 [ 448.935655] which belongs to the cache kmalloc-64 of size 64
2207 [ 449.084234] The buggy address is located 12 bytes inside of
2208 [ 449.084234] 64-byte region [ffff881fa516e900, ffff881fa516e940)
2209 [ 449.223442] The buggy address belongs to the page:
2210 [ 449.281170] page:ffffea007e945b80 count:1 mapcount:0
mapping: (null) index:0x0
2211 [ 449.377720] flags: 0x2fffff80000100(slab)
2212 [ 449.426094] raw: 002fffff80000100 0000000000000000
0000000000000000 00000001802a002a
2213 [ 449.519525] raw: dead000000000100 dead000000000200
ffff881fff40f640 0000000000000000
2214 [ 449.612961] page dumped because: kasan: bad access detected
2215 [ 449.680054]
2216 [ 449.698244] Memory state around the buggy address:
2217 [ 449.755972] ffff881fa516e800: fb fb fb fb fc fc fc fc fb fb fb
fb fb fb fb fb
2218 [ 449.843154] ffff881fa516e880: fc fc fc fc fb fb fb fb fb fb fb
fb fc fc fc fc
2219 [ 449.930345] >ffff881fa516e900: 30 30 30 30 00 00 00 06 fc fc fc
fc fc fc fc fc
2220 [ 450.017535] ^
2221 [ 450.059660] ffff881fa516e980: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
2222 [ 450.146846] ffff881fa516ea00: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
2223 [ 450.234027]
==================================================================
The "^" is pointing to the second "30" in above line (not sure it shows
correctly after copy/paste)
thanks,
wengang
In production kernel, we don't want unnecessary APIs without users in
the kernel because that
would consume binary size (a pure space waste) and leave "dead" code.
KASAN code is a bit different from other kernel components, its self
is debugging purpose only.
When KASAN is enabled, the APIs would have potential users and the
code is not "dead" code.
The size increasing in binary would be acceptable since the kernel
with KASAN enabled only has
a short time life -- only used to find the root cause, when root
caused is found, it will be no
longer used; Also the KASAN enabled kernel is used by limited user
where they have a particular
issue. I say "potential users" because this functionality its self is
dynamically used or to say a
one-shot use. The functionality is helpful.
I think even KASAN its self we don't know if it works or not when it
is not enabled.
-- Before I tried it, I am curious if this can work well; After
testing it, I know it works.
If we don't give users the chance, they will never know there is such
a functionality and will never
get benefit from it.
Besides, I'm bit skeptical about usefulness of this feature. Those
kinds of issues that
advanced check is supposed to catch, is almost always is just some
sort of longstanding
use after free, which eventually should be caught by kasan.
Yes, if luckily, the issue is possible to be catched by UAF check.
Well considering busy production systems, the memory is very likely to
be reallocated rather than
staying in free state for very long time. That is the
overwritten-to-allocated-memory is more
likely to happen than UAF does I think. When
overwritten-to-allocated-memory happened,
UAF check has no chance to detect the problem.
KASAN is helpful to detect problematic memory usage, so does this
patch set!
I really hope this can be included and developers can get benefit from
it.
Thanks,
Wengang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>