On Wed, Aug 26, 2020 at 8:03 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 8/24/20 8:01 PM, Muchun Song wrote: > > On Tue, Aug 25, 2020 at 5:21 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > >> > >> I too am looking at this now and do not completely understand the race. > >> It could be that: > >> > >> hugetlb_sysctl_handler_common > >> ... > >> table->data = &tmp; > >> > >> and, do_proc_doulongvec_minmax() > >> ... > >> return __do_proc_doulongvec_minmax(table->data, table, write, ... > >> with __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, ... > >> ... > >> i = (unsigned long *) data; > >> ... > >> *i = val; > >> > >> So, __do_proc_doulongvec_minmax can be dereferencing and writing to the pointer > >> in one thread when hugetlb_sysctl_handler_common is setting it in another? > > > > Yes, you are right. > > > >> > >> Another confusing part of the message is the stack trace which includes > >> ... > >> ? set_max_huge_pages+0x3da/0x4f0 > >> ? alloc_pool_huge_page+0x150/0x150 > >> > >> which are 'downstream' from these routines. I don't understand why these > >> are in the trace. > > > > I am also confused. But this issue can be reproduced easily by letting more > > than one thread write to `/proc/sys/vm/nr_hugepages`. With this patch applied, > > the issue can not be reproduced and disappears. > > There certainly is an issue here as one thread can modify data in another. > However, I am having a hard time seeing what causes the 'kernel NULL pointer > dereference'. If you write 0 to '/proc/sys/vm/nr_hugepages', you will get the kernel NULL pointer dereference, address: 0000000000000000 If you write 1024 to '/proc/sys/vm/nr_hugepages', you will get the kernel NULL pointer dereference, address: 0000000000000400 The address of dereference is the value which you write to the '/proc/sys/vm/nr_hugepages'. > > I tried to reproduce the issue myself but was unsuccessful. I have 16 threads > writing to /proc/sys/vm/nr_hugepages in an infinite loop. After several hours > running, I did not hit the issue. Just curious, what architecture is the > system? any special config or compiler options? > > If you can easily reproduce, can you post the detailed oops message? > > The 'NULL pointer' seems strange because after the first assignment to > table->data the value should never be NULL. Certainly it can be modified > by another thread, but I can not see how it can be NULL. At the beginning > of __do_proc_doulongvec_minmax, there is a check for NULL pointer with: CPU0: CPU1: proc_sys_write hugetlb_sysctl_handler proc_sys_call_handler hugetlb_sysctl_handler_common hugetlb_sysctl_handler table->data = &tmp; hugetlb_sysctl_handler_common table->data = &tmp; proc_doulongvec_minmax do_proc_doulongvec_minmax sysctl_head_finish __do_proc_doulongvec_minmax i = table->data; *i = val; // corrupt CPU1 stack If the val is 0, you will see the NULL. > > if (!data || !table->maxlen || !*lenp || (*ppos && !write)) { > *lenp = 0; > return 0; > } > > I looked at the code my compiler produced for __do_proc_doulongvec_minmax. > It appears to use the same value/register for the pointer throughout the > routine. IOW, I do not see how the pointer can be NULL for the assignment > when the routine does: > > *i = val; > > Again, your analysis/patch points out a real issue. I just want to get > a better understanding to make sure there is not another issue causing > the NULL pointer dereference. Below is my test script. There are 8 threads to execute the following script. In my qemu, it is easy to panic. Thanks. #!/bin/sh while : do echo 128 > /proc/sys/vm/nr_hugepages echo 0 > /proc/sys/vm/nr_hugepages done > -- > Mike Kravetz -- Yours, Muchun