Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> writes: > On Wed, May 27, 2020 at 12:58:05PM +0000, Luis Chamberlain wrote: >> Eric since you authored the code which this code claism to fix, your >> review would be appreciated. >> >> On Wed, May 27, 2020 at 01:48:48PM +0300, Boris Sukholitko wrote: >> > Successful get_subdir returns dir with its header.nreg properly >> > adjusted. No need to drop the dir in that case. >> >> This commit log is not that clear to me >> can you explain what happens >> without this patch, and how critical it is to fix it. How did you >> notice this issue? > > Apologies for being too terse with my explanation. I'll try to expand > below. > > In testing of our kernel (based on 4.19, tainted, sorry!) on our aarch64 based hardware > we've come upon the following oops (lightly edited to omit irrelevant > details): How does your 4.19 proc_sysctl.c compare with the latest proc-sysctl.c? Have you backported all of the most recent bug fixes? > 000:50:01.133 Unable to handle kernel paging request at virtual address 0000000000007a12 > 000:50:02.209 Process brctl (pid: 14467, stack limit = 0x00000000bcf7a578) > 000:50:02.209 CPU: 1 PID: 14467 Comm: brctl Tainted: P 4.19.122 #1 > 000:50:02.209 Hardware name: Broadcom-v8A (DT) > 000:50:02.209 pstate: 60000005 (nZCv daif -PAN -UAO) > 000:50:02.209 pc : unregister_sysctl_table+0x1c/0xa0 > 000:50:02.209 lr : unregister_net_sysctl_table+0xc/0x20 > 000:50:02.209 sp : ffffff800e5ab9e0 > 000:50:02.209 x29: ffffff800e5ab9e0 x28: ffffffc016439ec0 > 000:50:02.209 x27: 0000000000000000 x26: ffffff8008804078 > 000:50:02.209 x25: ffffff80087b4dd8 x24: ffffffc015d65000 > 000:50:02.209 x23: ffffffc01f0d6010 x22: ffffffc01f0d6000 > 000:50:02.209 x21: ffffffc0166c4eb0 x20: 00000000000000bd > 000:50:02.209 x19: ffffffc01f0d6030 x18: 0000000000000400 > 000:50:02.256 x17: 0000000000000000 x16: 0000000000000000 > 000:50:02.256 x15: 0000000000000400 x14: 0000000000000129 > 000:50:02.256 x13: 0000000000000001 x12: 0000000000000030 > 000:50:02.256 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f > 000:50:02.256 x9 : feff646663687161 x8 : ffffffffffffffff > 000:50:02.256 x7 : fefefefefefefefe x6 : 0000000000008080 > 000:50:02.256 x5 : 00000000ffffffff x4 : ffffff8008905c38 > 000:50:02.256 x3 : ffffffc01f0d602c x2 : 00000000000000bd > 000:50:02.256 x1 : ffffffc01f0d60c0 x0 : 0000000000007a12 > 000:50:02.256 Call trace: > 000:50:02.256 unregister_sysctl_table+0x1c/0xa0 > 000:50:02.256 unregister_net_sysctl_table+0xc/0x20 > 000:50:02.256 __devinet_sysctl_unregister.isra.0+0x2c/0x60 > 000:50:02.256 inetdev_event+0x198/0x510 > 000:50:02.256 notifier_call_chain+0x58/0xa0 > 000:50:02.303 raw_notifier_call_chain+0x14/0x20 > 000:50:02.303 call_netdevice_notifiers_info+0x34/0x80 > 000:50:02.303 rollback_registered_many+0x384/0x600 > 000:50:02.303 unregister_netdevice_queue+0x8c/0x110 > 000:50:02.303 br_dev_delete+0x8c/0xa0 > 000:50:02.303 br_del_bridge+0x44/0x70 > 000:50:02.303 br_ioctl_deviceless_stub+0xcc/0x310 > 000:50:02.303 sock_ioctl+0x194/0x3f0 > 000:50:02.303 compat_sock_ioctl+0x678/0xc00 > 000:50:02.303 __arm64_compat_sys_ioctl+0xf0/0xcb0 > 000:50:02.303 el0_svc_common+0x70/0x170 > 000:50:02.303 el0_svc_compat_handler+0x1c/0x30 > 000:50:02.303 el0_svc_compat+0x8/0x18 > 000:50:02.303 Code: a90153f3 aa0003f3 f9401000 b40000c0 (f9400001) > > The crash is in the call to count_subheaders(header->ctl_table_arg). > > Although the header (being in x19 == 0xffffffc01f0d6030) looks like a > normal kernel pointer, ctl_table_arg (x0 == 0x0000000000007a12) looks > invalid. > > Trying to find the issue, we've started tracing header allocation being > done by kzalloc in __register_sysctl_table and header freeing being done > in drop_sysctl_table. > > Then we've noticed headers being freed which where not allocated before. > The faulty freeing was done on parent->header at the end of > drop_sysctl_table. > > From this we've started to suspect some infelicity in header.nreg > refcounting, thus leading us the __register_sysctl_table fix in the > patch. > > Here is more detailed explanation of the fix. > > The current __register_sysctl_table logic looks like: > > 1. We start with some root dir, incrementing its header.nreg. > > 2. Then we find suitable dir using get_subdir function. > > 3. get_subdir decrements nreg on the parent dir and increments it on the > dir being returned. See found label there. > > 4. We decrement dir's header.nreg for the symmetry with step 1. > > IMHO, the bug is on step 4. If another dir is being returned by > get_subdir we decrement its nreg. I.e. the returned dir nreg stays 1 > despite having children added to it. > > This leads eventually to the innocent parent header being freed. > But the insertion of children in insert_header also increases the count so it does not look like that should be true. >> If you don't apply this patch what issue do you see? > > For some unexplained reason, the crashes are very rare and require > stressing the system while creating and destroing network interfaces. > >> >> Do we test for it? Can we? >> > > With some printk tracing the issue is easy to see while doing simple > brctl addbr / delbr to create and destroy bridge interface. > > Probably there is some SLUB debug option which may allow to catch the > faulty free. I see some recent (within the last year) fixes to proc_sysctl.c in this area. Do you have those? It looks like bridge up and down is stressing this code. Either those most recent fixes are wrong, your kernel is missing them or this needs some more investigation. Eric