On Wed, May 27, 2020 at 12:58:05PM +0000, Luis Chamberlain wrote: > Eric since you authored the code which this code claism to fix, your > review would be appreciated. > > On Wed, May 27, 2020 at 01:48:48PM +0300, Boris Sukholitko wrote: > > Successful get_subdir returns dir with its header.nreg properly > > adjusted. No need to drop the dir in that case. > > This commit log is not that clear to me > can you explain what happens > without this patch, and how critical it is to fix it. How did you > notice this issue? Apologies for being too terse with my explanation. I'll try to expand below. In testing of our kernel (based on 4.19, tainted, sorry!) on our aarch64 based hardware we've come upon the following oops (lightly edited to omit irrelevant details): 000:50:01.133 Unable to handle kernel paging request at virtual address 0000000000007a12 000:50:02.209 Process brctl (pid: 14467, stack limit = 0x00000000bcf7a578) 000:50:02.209 CPU: 1 PID: 14467 Comm: brctl Tainted: P 4.19.122 #1 000:50:02.209 Hardware name: Broadcom-v8A (DT) 000:50:02.209 pstate: 60000005 (nZCv daif -PAN -UAO) 000:50:02.209 pc : unregister_sysctl_table+0x1c/0xa0 000:50:02.209 lr : unregister_net_sysctl_table+0xc/0x20 000:50:02.209 sp : ffffff800e5ab9e0 000:50:02.209 x29: ffffff800e5ab9e0 x28: ffffffc016439ec0 000:50:02.209 x27: 0000000000000000 x26: ffffff8008804078 000:50:02.209 x25: ffffff80087b4dd8 x24: ffffffc015d65000 000:50:02.209 x23: ffffffc01f0d6010 x22: ffffffc01f0d6000 000:50:02.209 x21: ffffffc0166c4eb0 x20: 00000000000000bd 000:50:02.209 x19: ffffffc01f0d6030 x18: 0000000000000400 000:50:02.256 x17: 0000000000000000 x16: 0000000000000000 000:50:02.256 x15: 0000000000000400 x14: 0000000000000129 000:50:02.256 x13: 0000000000000001 x12: 0000000000000030 000:50:02.256 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 000:50:02.256 x9 : feff646663687161 x8 : ffffffffffffffff 000:50:02.256 x7 : fefefefefefefefe x6 : 0000000000008080 000:50:02.256 x5 : 00000000ffffffff x4 : ffffff8008905c38 000:50:02.256 x3 : ffffffc01f0d602c x2 : 00000000000000bd 000:50:02.256 x1 : ffffffc01f0d60c0 x0 : 0000000000007a12 000:50:02.256 Call trace: 000:50:02.256 unregister_sysctl_table+0x1c/0xa0 000:50:02.256 unregister_net_sysctl_table+0xc/0x20 000:50:02.256 __devinet_sysctl_unregister.isra.0+0x2c/0x60 000:50:02.256 inetdev_event+0x198/0x510 000:50:02.256 notifier_call_chain+0x58/0xa0 000:50:02.303 raw_notifier_call_chain+0x14/0x20 000:50:02.303 call_netdevice_notifiers_info+0x34/0x80 000:50:02.303 rollback_registered_many+0x384/0x600 000:50:02.303 unregister_netdevice_queue+0x8c/0x110 000:50:02.303 br_dev_delete+0x8c/0xa0 000:50:02.303 br_del_bridge+0x44/0x70 000:50:02.303 br_ioctl_deviceless_stub+0xcc/0x310 000:50:02.303 sock_ioctl+0x194/0x3f0 000:50:02.303 compat_sock_ioctl+0x678/0xc00 000:50:02.303 __arm64_compat_sys_ioctl+0xf0/0xcb0 000:50:02.303 el0_svc_common+0x70/0x170 000:50:02.303 el0_svc_compat_handler+0x1c/0x30 000:50:02.303 el0_svc_compat+0x8/0x18 000:50:02.303 Code: a90153f3 aa0003f3 f9401000 b40000c0 (f9400001) The crash is in the call to count_subheaders(header->ctl_table_arg). Although the header (being in x19 == 0xffffffc01f0d6030) looks like a normal kernel pointer, ctl_table_arg (x0 == 0x0000000000007a12) looks invalid. Trying to find the issue, we've started tracing header allocation being done by kzalloc in __register_sysctl_table and header freeing being done in drop_sysctl_table. Then we've noticed headers being freed which where not allocated before. The faulty freeing was done on parent->header at the end of drop_sysctl_table. >From this we've started to suspect some infelicity in header.nreg refcounting, thus leading us the __register_sysctl_table fix in the patch. Here is more detailed explanation of the fix. The current __register_sysctl_table logic looks like: 1. We start with some root dir, incrementing its header.nreg. 2. Then we find suitable dir using get_subdir function. 3. get_subdir decrements nreg on the parent dir and increments it on the dir being returned. See found label there. 4. We decrement dir's header.nreg for the symmetry with step 1. IMHO, the bug is on step 4. If another dir is being returned by get_subdir we decrement its nreg. I.e. the returned dir nreg stays 1 despite having children added to it. This leads eventually to the innocent parent header being freed. > If you don't apply this patch what issue do you see? For some unexplained reason, the crashes are very rare and require stressing the system while creating and destroing network interfaces. > > Do we test for it? Can we? > With some printk tracing the issue is easy to see while doing simple brctl addbr / delbr to create and destroy bridge interface. Probably there is some SLUB debug option which may allow to catch the faulty free. Thanks, Boris. > Luis > > > > > Signed-off-by: Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> > > Fixes: 7ec66d06362d (sysctl: Stop requiring explicit management of sysctl directories) > > --- > > fs/proc/proc_sysctl.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c > > index b6f5d459b087..6f237f0eb531 100644 > > --- a/fs/proc/proc_sysctl.c > > +++ b/fs/proc/proc_sysctl.c > > @@ -1286,8 +1286,8 @@ struct ctl_table_header *__register_sysctl_table( > > { > > struct ctl_table_root *root = set->dir.header.root; > > struct ctl_table_header *header; > > + struct ctl_dir *dir, *start_dir; > > const char *name, *nextname; > > - struct ctl_dir *dir; > > struct ctl_table *entry; > > struct ctl_node *node; > > int nr_entries = 0; > > @@ -1307,6 +1307,7 @@ struct ctl_table_header *__register_sysctl_table( > > > > spin_lock(&sysctl_lock); > > dir = &set->dir; > > + start_dir = dir; > > /* Reference moved down the diretory tree get_subdir */ > > dir->header.nreg++; > > spin_unlock(&sysctl_lock); > > @@ -1333,7 +1334,8 @@ struct ctl_table_header *__register_sysctl_table( > > if (insert_header(dir, header)) > > goto fail_put_dir_locked; > > > > - drop_sysctl_table(&dir->header); > > + if (start_dir == dir) > > + drop_sysctl_table(&dir->header); > > spin_unlock(&sysctl_lock); > > > > return header; > > -- > > 2.23.1 > >