Re: linux 5.14.3: free_user_ns causes NULL pointer dereference

Hillf Danton <hdanton@xxxxxxxx> · Wed, 6 Oct 2021 10:12:19 +0800

On Thu, 30 Sep 2021 16:27:34 -0600 Yu Zhao wrote:
>On Thu, Sep 30, 2021 at 7:06 AM Alexey Gladkov <legion@xxxxxxxxxx> wrote:
>>
>> On Wed, Sep 29, 2021 at 09:39:06PM +0000, Jordan Glover wrote:
>> > > I'm still investigating, but I would like to rule out one option.
>> > >
>> > > Could you check out the patch?
>> >
>> >
>> > Thx, I added it to my kernel and will report in few days.
>> > Does this patch try to fix the issue or make it easier to track?
>>
>> I suspect the error is caused by a race between allow_ucounts() and
>> put_ucounts(). I think this patch could solve the problem.
>
>Thanks for your help. Still can reproduce the problem with the change suggested.
>
>[ 7761.885966] ==================================================================
>[ 7761.893462] BUG: KASAN: use-after-free in dec_ucount+0x50/0xd8
>[ 7761.899491] Write of size 8 at addr ffffff80c537b140 by task
>kworker/u16:3/10303
>[ 7761.907110]
>[ 7761.908668] CPU: 0 PID: 10303 Comm: kworker/u16:3 Not tainted
>5.14.0-lockdep+ #1
>[ 7761.916289] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
>[ 7761.923021] Workqueue: netns cleanup_net
>[ 7761.927106] Call trace:
>[ 7761.929648]  dump_backtrace+0x0/0x42c
>[ 7761.933442]  show_stack+0x24/0x30
>[ 7761.936878]  dump_stack_lvl+0xd0/0x100
>[ 7761.940763]  print_address_description+0x30/0x304
>[ 7761.945628]  kasan_report+0x190/0x1d8
>[ 7761.949418]  kasan_check_range+0x1ac/0x1bc
>[ 7761.953655]  __kasan_check_write+0x44/0x54
>[ 7761.957891]  dec_ucount+0x50/0xd8
>[ 7761.961334]  cleanup_net+0x630/0x718
>[ 7761.965036]  process_one_work+0x7b4/0x10ec
>[ 7761.969274]  worker_thread+0x800/0xcf4
>[ 7761.973152]  kthread+0x2a8/0x358
>[ 7761.976496]  ret_from_fork+0x10/0x18
>[ 7761.980201]
>[ 7761.981761] Allocated by task 4840:
>[ 7761.985366]  kasan_save_stack+0x38/0x68
>[ 7761.989342]  __kasan_kmalloc+0x9c/0xb8
>[ 7761.993222]  kmem_cache_alloc_trace+0x2a4/0x370
>[ 7761.997905]  alloc_ucounts+0x150/0x374
>[ 7762.001787]  set_cred_ucounts+0x198/0x248
>[ 7762.005935]  __sys_setresuid+0x31c/0x4f8
>[ 7762.009993]  __arm64_sys_setresuid+0x84/0x98
>[ 7762.014410]  invoke_syscall+0xd4/0x2c8
>[ 7762.018292]  el0_svc_common+0x124/0x200
>[ 7762.022265]  do_el0_svc_compat+0x54/0x64
>[ 7762.026325]  el0_svc_compat+0x24/0x34
>[ 7762.030124]  el0t_32_sync_handler+0xc0/0xf0
>[ 7762.034451]  el0t_32_sync+0x19c/0x1a0
>[ 7762.038241]
>[ 7762.039799] Freed by task 0:
>[ 7762.042778]  kasan_save_stack+0x38/0x68
>[ 7762.046747]  kasan_set_track+0x28/0x3c
>[ 7762.050625]  kasan_set_free_info+0x28/0x4c
>[ 7762.054857]  ____kasan_slab_free+0x118/0x164
>[ 7762.059277]  __kasan_slab_free+0x18/0x28
>[ 7762.063339]  kfree+0x2f8/0x500
>[ 7762.066505]  put_ucounts+0x11c/0x134
>[ 7762.070209]  put_cred_rcu+0x1bc/0x35c
>[ 7762.074006]  rcu_core+0xa68/0x1b20
>[ 7762.077538]  rcu_core_si+0x1c/0x28
>[ 7762.081061]  __do_softirq+0x4bc/0xedc
>[ 7762.084851]
>[ 7762.086401] The buggy address belongs to the object at ffffff80c537b100
>[ 7762.086401]  which belongs to the cache kmalloc-256 of size 256
>[ 7762.099267] The buggy address is located 64 bytes inside of
>[ 7762.099267]  256-byte region [ffffff80c537b100, ffffff80c537b200)
>[ 7762.111248] The buggy address belongs to the page:
>[ 7762.116185] page:fffffffe0314de00 refcount:1 mapcount:0
>mapping:0000000000000000 index:0xffffff80c537ad00 pfn:0x145378
>[ 7762.127180] head:fffffffe0314de00 order:3 compound_mapcount:0
>compound_pincount:0
>[ 7762.134881] flags: 0x8000000000010200(slab|head|zone=2)
>[ 7762.140286] raw: 8000000000010200 fffffffe02799408 fffffffe02020808
>ffffff808000c980
>[ 7762.148263] raw: ffffff80c537ad00 0000000000200004 00000001ffffffff
>0000000000000000
>[ 7762.156228] page dumped because: kasan: bad access detected
>[ 7762.161974]
>[ 7762.163532] Memory state around the buggy address:
>[ 7762.168475]  ffffff80c537b000: fc fc fc fc fc fc fc fc fc fc fc fc
>fc fc fc fc
>[ 7762.175915]  ffffff80c537b080: fc fc fc fc fc fc fc fc fc fc fc fc
>fc fc fc fc
>[ 7762.183346] >ffffff80c537b100: fa fb fb fb fb fb fb fb fb fb fb fb
>fb fb fb fb
>[ 7762.190774]                                            ^
>[ 7762.196258]  ffffff80c537b180: fb fb fb fb fb fb fb fb fb fb fb fb
>fb fb fb fb
>[ 7762.203689]  ffffff80c537b200: fc fc fc fc fc fc fc fc fc fc fc fc
>fc fc fc fc
>[ 7762.211125] ==================================================================

Could you please check if it is due to count underflow? Given nothing wrong
on the other side based on the efforts
"We looked through the users of put_ucounts and we don't see any obvious buggy
users that would be freeing the data structure early."

Thanks
Hillf

--- linux-5.14.4/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -152,7 +152,10 @@ static void hlist_add_ucounts(struct uco
 
 struct ucounts *get_ucounts(struct ucounts *ucounts)
 {
-	if (ucounts && atomic_add_negative(1, &ucounts->count)) {
+	if (!ucounts)
+		return NULL;
+	WARN_ON(!atomic_read(&ucounts->count));
+	if (atomic_add_negative(1, &ucounts->count)) {
 		put_ucounts(ucounts);
 		ucounts = NULL;
 	}
--