Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access

Huacai Chen <chenhuacai@xxxxxxxxxx> · Wed, 16 Oct 2024 15:30:34 +0800

On Wed, Oct 16, 2024 at 2:09 PM maobibo <maobibo@xxxxxxxxxxx> wrote:
>
>
>
> On 2024/10/15 下午8:27, Huacai Chen wrote:
> > On Tue, Oct 15, 2024 at 10:54 AM maobibo <maobibo@xxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 2024/10/14 下午2:31, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
> >>>>
> >>>> It is possible to return a spurious fault if memory is accessed
> >>>> right after the pte is set. For user address space, pte is set
> >>>> in kernel space and memory is accessed in user space, there is
> >>>> long time for synchronization, no barrier needed. However for
> >>>> kernel address space, it is possible that memory is accessed
> >>>> right after the pte is set.
> >>>>
> >>>> Here flush_cache_vmap/flush_cache_vmap_early is used for
> >>>> synchronization.
> >>>>
> >>>> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
> >>>> ---
> >>>>    arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
> >>>>    1 file changed, 13 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> >>>> index f8754d08a31a..53be231319ef 100644
> >>>> --- a/arch/loongarch/include/asm/cacheflush.h
> >>>> +++ b/arch/loongarch/include/asm/cacheflush.h
> >>>> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
> >>>>    #define flush_cache_dup_mm(mm)                         do { } while (0)
> >>>>    #define flush_cache_range(vma, start, end)             do { } while (0)
> >>>>    #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
> >>>> -#define flush_cache_vmap(start, end)                   do { } while (0)
> >>>>    #define flush_cache_vunmap(start, end)                 do { } while (0)
> >>>>    #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
> >>>>    #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
> >>>>    #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
> >>>>
> >>>> +/*
> >>>> + * It is possible for a kernel virtual mapping access to return a spurious
> >>>> + * fault if it's accessed right after the pte is set. The page fault handler
> >>>> + * does not expect this type of fault. flush_cache_vmap is not exactly the
> >>>> + * right place to put this, but it seems to work well enough.
> >>>> + */
> >>>> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> >>>> +{
> >>>> +       smp_mb();
> >>>> +}
> >>>> +#define flush_cache_vmap flush_cache_vmap
> >>>> +#define flush_cache_vmap_early flush_cache_vmap
> >>>   From the history of flush_cache_vmap_early(), It seems only archs with
> >>> "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
> >>> no-op here.
> > OK,  flush_cache_vmap_early() also needs smp_mb().
> >
> >>
> >> Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c,
> >> map the page and access it immediately. Do you think it should be noop
> >> on LoongArch.
> >>
> >> rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
> >>                                        unit_pages);
> >> if (rc < 0)
> >>       panic("failed to map percpu area, err=%d\n", rc);
> >>       flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
> >>       /* copy static data */
> >>       memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
> >> }
> >>
> >>
> >>>
> >>> And I still think flush_cache_vunmap() should be a smp_mb(). A
> >>> smp_mb() in flush_cache_vmap() prevents subsequent accesses be
> >>> reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
> >> smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush
> >> pipeline and let page table walker HW sync with data cache.
> >>
> >> For the following example.
> >>     rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
> >>                     VM_MAP | VM_USERMAP, PAGE_KERNEL);
> >>     if (rb) {
> >> <<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with
> >> any API kmalloc/vmap/vmalloc and subsequent memory access, there will be
> >> reorder issu. *
> >>         kmemleak_not_leak(pages);
> >>         rb->pages = pages;
> >>         rb->nr_pages = nr_pages;
> >>         return rb;
> >>     }
> >>
> >>> prevents preceding accesses be reordered after pte_clear(). This
> >> Can you give an example about such usage about flush_cache_vunmap()? and
> >> we can continue to talk about it, else it is just guessing.
> > Since we cannot reach a consensus, and the flush_cache_* API look very
> > strange for this purpose (Yes, I know PowerPC does it like this, but
> > ARM64 doesn't). I prefer to still use the ARM64 method which means add
> > a dbar in set_pte(). Of course the performance will be a little worse,
> > but still better than the old version, and it is more robust.
> >
> > I know you are very busy, so if you have no time you don't need to
> > send V3, I can just do a small modification on the 3rd patch.
> No, I will send V3 by myself. And I will drop the this patch in this
> patchset since by actual test vmalloc_test works well even without this
> patch on 3C5000 Dual-way, also weak function kernel_pte_init will be
> replaced with inline function rebased on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-define-general-function-pxd_init.patch
This patch is in Andrew's mm-unstable branch. As far I know,
mm-unstable is for next (6.13) and mm-stable is for current (6.12).

But this series is bugfix, so it is for current (6.12).

>
> I dislike the copy-paste method without further understanding :(,
> although I also copy and paste code, but as least I try best to
> understand it.

I dislike too. But in order to make this series be in 6.12, it is
better to keep copy-paste, and then update the refactoring patch to V2
for Andrew (rebase and drop is normal for mm-unstable).

Huacai

>
> Regards
> Bibo Mao
> >
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>> potential problem may not be seen from experiment, but it is needed in
> >>> theory.
> >>>
> >>> Huacai
> >>>
> >>>> +
> >>>>    #define cache_op(op, addr)                                             \
> >>>>           __asm__ __volatile__(                                           \
> >>>>           "       cacop   %0, %1                                  \n"     \
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>>
> >>
> >>
>