On Wed, Oct 16, 2024 at 2:09 PM maobibo <maobibo@xxxxxxxxxxx> wrote: > > > > On 2024/10/15 下午8:27, Huacai Chen wrote: > > On Tue, Oct 15, 2024 at 10:54 AM maobibo <maobibo@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 2024/10/14 下午2:31, Huacai Chen wrote: > >>> Hi, Bibo, > >>> > >>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@xxxxxxxxxxx> wrote: > >>>> > >>>> It is possible to return a spurious fault if memory is accessed > >>>> right after the pte is set. For user address space, pte is set > >>>> in kernel space and memory is accessed in user space, there is > >>>> long time for synchronization, no barrier needed. However for > >>>> kernel address space, it is possible that memory is accessed > >>>> right after the pte is set. > >>>> > >>>> Here flush_cache_vmap/flush_cache_vmap_early is used for > >>>> synchronization. > >>>> > >>>> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx> > >>>> --- > >>>> arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++- > >>>> 1 file changed, 13 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h > >>>> index f8754d08a31a..53be231319ef 100644 > >>>> --- a/arch/loongarch/include/asm/cacheflush.h > >>>> +++ b/arch/loongarch/include/asm/cacheflush.h > >>>> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end); > >>>> #define flush_cache_dup_mm(mm) do { } while (0) > >>>> #define flush_cache_range(vma, start, end) do { } while (0) > >>>> #define flush_cache_page(vma, vmaddr, pfn) do { } while (0) > >>>> -#define flush_cache_vmap(start, end) do { } while (0) > >>>> #define flush_cache_vunmap(start, end) do { } while (0) > >>>> #define flush_icache_user_page(vma, page, addr, len) do { } while (0) > >>>> #define flush_dcache_mmap_lock(mapping) do { } while (0) > >>>> #define flush_dcache_mmap_unlock(mapping) do { } while (0) > >>>> > >>>> +/* > >>>> + * It is possible for a kernel virtual mapping access to return a spurious > >>>> + * fault if it's accessed right after the pte is set. The page fault handler > >>>> + * does not expect this type of fault. flush_cache_vmap is not exactly the > >>>> + * right place to put this, but it seems to work well enough. > >>>> + */ > >>>> +static inline void flush_cache_vmap(unsigned long start, unsigned long end) > >>>> +{ > >>>> + smp_mb(); > >>>> +} > >>>> +#define flush_cache_vmap flush_cache_vmap > >>>> +#define flush_cache_vmap_early flush_cache_vmap > >>> From the history of flush_cache_vmap_early(), It seems only archs with > >>> "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a > >>> no-op here. > > OK, flush_cache_vmap_early() also needs smp_mb(). > > > >> > >> Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c, > >> map the page and access it immediately. Do you think it should be noop > >> on LoongArch. > >> > >> rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages], > >> unit_pages); > >> if (rc < 0) > >> panic("failed to map percpu area, err=%d\n", rc); > >> flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size); > >> /* copy static data */ > >> memcpy((void *)unit_addr, __per_cpu_load, ai->static_size); > >> } > >> > >> > >>> > >>> And I still think flush_cache_vunmap() should be a smp_mb(). A > >>> smp_mb() in flush_cache_vmap() prevents subsequent accesses be > >>> reordered before pte_set(), and a smp_mb() in flush_cache_vunmap() > >> smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush > >> pipeline and let page table walker HW sync with data cache. > >> > >> For the following example. > >> rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages, > >> VM_MAP | VM_USERMAP, PAGE_KERNEL); > >> if (rb) { > >> <<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with > >> any API kmalloc/vmap/vmalloc and subsequent memory access, there will be > >> reorder issu. * > >> kmemleak_not_leak(pages); > >> rb->pages = pages; > >> rb->nr_pages = nr_pages; > >> return rb; > >> } > >> > >>> prevents preceding accesses be reordered after pte_clear(). This > >> Can you give an example about such usage about flush_cache_vunmap()? and > >> we can continue to talk about it, else it is just guessing. > > Since we cannot reach a consensus, and the flush_cache_* API look very > > strange for this purpose (Yes, I know PowerPC does it like this, but > > ARM64 doesn't). I prefer to still use the ARM64 method which means add > > a dbar in set_pte(). Of course the performance will be a little worse, > > but still better than the old version, and it is more robust. > > > > I know you are very busy, so if you have no time you don't need to > > send V3, I can just do a small modification on the 3rd patch. > No, I will send V3 by myself. And I will drop the this patch in this > patchset since by actual test vmalloc_test works well even without this > patch on 3C5000 Dual-way, also weak function kernel_pte_init will be > replaced with inline function rebased on > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-define-general-function-pxd_init.patch This patch is in Andrew's mm-unstable branch. As far I know, mm-unstable is for next (6.13) and mm-stable is for current (6.12). But this series is bugfix, so it is for current (6.12). > > I dislike the copy-paste method without further understanding :(, > although I also copy and paste code, but as least I try best to > understand it. I dislike too. But in order to make this series be in 6.12, it is better to keep copy-paste, and then update the refactoring patch to V2 for Andrew (rebase and drop is normal for mm-unstable). Huacai > > Regards > Bibo Mao > > > > > > Huacai > > > >> > >> Regards > >> Bibo Mao > >>> potential problem may not be seen from experiment, but it is needed in > >>> theory. > >>> > >>> Huacai > >>> > >>>> + > >>>> #define cache_op(op, addr) \ > >>>> __asm__ __volatile__( \ > >>>> " cacop %0, %1 \n" \ > >>>> -- > >>>> 2.39.3 > >>>> > >>>> > >> > >> >