Re: [External] Re: [PATCH] mm/hugetlb: Fix a race between hugetlb sysctl handlers

Muchun Song <songmuchun@xxxxxxxxxxxxx> · Tue, 25 Aug 2020 11:01:17 +0800

On Tue, Aug 25, 2020 at 5:21 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 8/24/20 1:59 PM, Andrew Morton wrote:
> > On Sat, 22 Aug 2020 17:53:28 +0800 Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
> >
> >> There is a race between the assignment of `table->data` and write value
> >> to the pointer of `table->data` in the __do_proc_doulongvec_minmax().
> >
> > Where does __do_proc_doulongvec_minmax() write to table->data?
> >
> > I think you're saying that there is a race between the assignment of
> > ctl_table->table in hugetlb_sysctl_handler_common() and the assignment
> > of the same ctl_table->table in hugetlb_overcommit_handler()?
> >
> > Or not, maybe I'm being thick.  Can you please describe the race more
> > carefully and completely?
> >
>
> I too am looking at this now and do not completely understand the race.
> It could be that:
>
> hugetlb_sysctl_handler_common
> ...
>         table->data = &tmp;
>
> and, do_proc_doulongvec_minmax()
> ...
>         return __do_proc_doulongvec_minmax(table->data, table, write, ...
> with __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, ...
> ...
>         i = (unsigned long *) data;
>         ...
>                 *i = val;
>
> So, __do_proc_doulongvec_minmax can be dereferencing and writing to the pointer
> in one thread when hugetlb_sysctl_handler_common is setting it in another?

Yes, you are right.

>
> Another confusing part of the message is the stack trace which includes
> ...
>      ? set_max_huge_pages+0x3da/0x4f0
>      ? alloc_pool_huge_page+0x150/0x150
>
> which are 'downstream' from these routines.  I don't understand why these
> are in the trace.

I am also confused. But this issue can be reproduced easily by letting more
than one thread write to `/proc/sys/vm/nr_hugepages`. With this patch applied,
the issue can not be reproduced and disappears.

>
> If the race is with the pointer set and dereference/write, then this type of
> fix is OK.  However, if you really have two 'sysadmin type' global operations
> racing then one or both are not going to get what they expected.  Instead of

In our team, more than one developer shares one server which is a test server.
I guess that the panic happens when two people write
`/proc/sys/vm/nr_hugepages`.

> changing the code to 'handle the race', I think it might be acceptable to just
> put a big semaphore around it.

It is also a good idea to fix this issue. Thanks.

> --
> Mike Kravetz

-- 
Yours,
Muchun