[PATCH] percpu: km: Use for SMP+NOMMU

Vladimir Murzin <vladimir.murzin@xxxxxxx> · Tue, 30 Nov 2021 17:29:53 +0000

I have recently updated kernel for R-class (SMP+NOMMU) and observed
following build failure:

arm-none-linux-gnueabihf-ld: mm/percpu.o: in function `pcpu_post_unmap_tlb_flush':
mm/percpu-vm.c:188: undefined reference to `flush_tlb_kernel_range'

ARM NOMMU declare flush_tlb_kernel_range() but doesn't define it, so I
tried to understand what pulls that function...

First candidate is 93274f1dd6b0 ("percpu: flush tlb in
pcpu_reclaim_populated()") which added another user of
pcpu_post_unmap_tlb_flush(), yet simple revert did not make things
better...

The second candidate turned to be 4ad0ae8c64ac ("mm/vmalloc: remove
unmap_kernel_range"), more precisely NOMMU part. This one is
interesting. Before conversion to vmap_pages_range_noflush() we had

static inline int
map_kernel_range_noflush(unsigned long start, unsigned long size,
                        pgprot_t prot, struct page **pages)
{
        return size >> PAGE_SHIFT;
}

static int __pcpu_map_pages(unsigned long addr, struct page **pages,
                            int nr_pages)
{
        return map_kernel_range_noflush(addr, nr_pages << PAGE_SHIFT,
                                        PAGE_KERNEL, pages);
}

static int pcpu_map_pages(struct pcpu_chunk *chunk,
                          struct page **pages, int page_start, int page_end)
{
        unsigned int cpu, tcpu;
        int i, err;

        for_each_possible_cpu(cpu) {
                err = __pcpu_map_pages(pcpu_chunk_addr(chunk, cpu, page_start),
                                       &pages[pcpu_page_idx(cpu, page_start)],
                                       page_end - page_start);
                if (err < 0)
                        goto err;

                for (i = page_start; i < page_end; i++)
                        pcpu_set_page_chunk(pages[pcpu_page_idx(cpu, i)],
                                            chunk);
        }
        return 0;
err:
        for_each_possible_cpu(tcpu) {
                if (tcpu == cpu)
                        break;
                __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
                                   page_end - page_start);
        }
        pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
        return err;
}

Here __pcpu_map_pages() would never return negative value, so
compiler optimizes error path and pcpu_post_unmap_tlb_flush() never
referenced.

After conversion to vmap_pages_range_noflush() we got

static inline
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
                pgprot_t prot, struct page **pages, unsigned int page_shift)
{
       return -EINVAL;
}

static int __pcpu_map_pages(unsigned long addr, struct page **pages,
                            int nr_pages)
{
        return vmap_pages_range_noflush(addr, addr + (nr_pages << PAGE_SHIFT),
                                        PAGE_KERNEL, pages, PAGE_SHIFT);
}

and pcpu_map_pages() unchanged.

Here __pcpu_map_pages() always return negative value, so compiler
cannot optimise error path and pcpu_post_unmap_tlb_flush() stay
referenced.

Now it is kind of clear why it worked before and why it refuses to build.

Next is to understand how to fix that. I noticed [1] following comment
from Nicholas:

> Previous code had a strange NOMMU implementation of
> map_kernel_range_noflush that came in with commit b554cb426a955
> ("NOMMU: support SMP dynamic percpu_alloc") which would return
> success if the size was <= PAGE_SIZE, but that has no way of working
> on NOMMU because the page can not be mapped to the new start address
> even if there is only one of them. So change this code to always
> return failure.
>
> NOMMU probably needs to take a closer look at what it does with
> percpu-vm.c and carve out a special case there if necessary rather
> than pretend vmap works.

So I started looking into it. First thing I noticed is than UP+NOMMU
doesn't run into the same issue. The reason is that they pull
mm/percpu-km.c with

config NEED_PER_CPU_KM
        depends on !SMP
        bool
        default y

Looking more into history of kernel memory based allocator it looks
like it was designed with SMP in mind at least it is mentioned in
b0c9778b1d07 ("percpu: implement kernel memory based chunk
allocation"):

> Implement an alternate percpu chunk management based on kernel memeory
> for nommu SMP architectures. 

... and even latter when NEED_PER_CPU_KM was introduced by
bbddff054587 ("percpu: use percpu allocator on UP too"):

> Currently, users of percpu allocators need to handle UP differently,
> which is somewhat fragile and ugly.  Other than small amount of
> memory, there isn't much to lose by enabling percpu allocator on UP.
> It can simply use kernel memory based chunk allocation which was added
> for SMP archs w/o MMUs.

I could not find justification on prohibiting SMP+NOMMU in patch
discussion [2] either. It looks like that dependency was overlooked and
did not consider SMP+NOMMU.

This probably also was a reason for b554cb426a95 ("NOMMU: support SMP
dynamic percpu_alloc"), yet patch author and reviewers were also included
into original b0c9778b1d07 ("percpu: implement kernel memory based chunk
allocation")...

Unless I'm missing something I'm proposing bringing kernel memory based
allocator back for use with SMP+NOMMU

Vladimir Murzin (1):
  percpu: km: ensure it is used with NOMMU (either UP or SMP)

 mm/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.7.4