On Mon, Oct 21, 2024 at 9:23 AM maobibo <maobibo@xxxxxxxxxxx> wrote: > > > > On 2024/10/18 下午2:32, Huacai Chen wrote: > > On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 2024/10/18 下午12:23, Huacai Chen wrote: > >>> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@xxxxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 2024/10/18 下午12:11, Huacai Chen wrote: > >>>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@xxxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2024/10/18 上午11:14, Huacai Chen wrote: > >>>>>>> Hi, Bibo, > >>>>>>> > >>>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c: > >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067 > >>>>>>> > >>>>>>> Because kernel_pte_init() should operate on page-table pages, not on > >>>>>>> data pages. You have already handle page-table page in > >>>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages > >>>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is > >>>>>>> enabled. > >>>>>>> > >>>>>> static inline void set_pte(pte_t *ptep, pte_t pteval) > >>>>>> { > >>>>>> WRITE_ONCE(*ptep, pteval); > >>>>>> - > >>>>>> - if (pte_val(pteval) & _PAGE_GLOBAL) { > >>>>>> - pte_t *buddy = ptep_buddy(ptep); > >>>>>> - /* > >>>>>> - * Make sure the buddy is global too (if it's !none, > >>>>>> - * it better already be global) > >>>>>> - */ > >>>>>> - if (pte_none(ptep_get(buddy))) { > >>>>>> -#ifdef CONFIG_SMP > >>>>>> - /* > >>>>>> - * For SMP, multiple CPUs can race, so we need > >>>>>> - * to do this atomically. > >>>>>> - */ > >>>>>> - __asm__ __volatile__( > >>>>>> - __AMOR "$zero, %[global], %[buddy] \n" > >>>>>> - : [buddy] "+ZB" (buddy->pte) > >>>>>> - : [global] "r" (_PAGE_GLOBAL) > >>>>>> - : "memory"); > >>>>>> - > >>>>>> - DBAR(0b11000); /* o_wrw = 0b11000 */ > >>>>>> -#else /* !CONFIG_SMP */ > >>>>>> - WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL)); > >>>>>> -#endif /* CONFIG_SMP */ > >>>>>> - } > >>>>>> - } > >>>>>> + DBAR(0b11000); /* o_wrw = 0b11000 */ > >>>>>> } > >>>>>> > >>>>>> No, please hold on. This issue exists about twenty years, Do we need be > >>>>>> in such a hurry now? > >>>>>> > >>>>>> why is DBAR(0b11000) added in set_pte()? > >>>>> It exists before, not added by this patch. The reason is explained in > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030 > >>>> why speculative accesses may cause spurious page fault in kernel space > >>>> with PTE enabled? speculative accesses exists anywhere, it does not > >>>> cause spurious page fault. > >>> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that > >>> means another patch's mistake, not this one. This one just keeps the > >>> old behavior. > >>> +CC Ruiyang Wu here. > >> Also from Ruiyang Wu, the information is that speculative accesses may > >> insert stale TLB, however no page fault exception. > >> > >> So adding barrier in set_pte() does not prevent speculative accesses. > >> And you write patch here, however do not know the actual reason? > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030 > > I have CCed Ruiyang, whether the description is correct can be judged by him. > > There are some problems to add barrier() in set_pte(): > > 1. There is such issue only for HW ptw enabled and kernel address space, > is that? Also it may be two heavy to add barrier in set_pte(), comparing > to do this in flush_cache_vmap(). So adding a barrier in set_pte() may not be the best solution for performance, but you cannot say it is a wrong solution. And yes, we can only care the kernel space, which is also the old behavior before this patch, so set_pte() should be: static inline void set_pte(pte_t *ptep, pte_t pteval) { WRITE_ONCE(*ptep, pteval); #ifdef CONFIG_SMP if (pte_val(pteval) & _PAGE_GLOBAL) DBAR(0b11000); /* o_wrw = 0b11000 */ #endif } Putting a dbar unconditionally in set_pte() is my mistake, I'm sorry for that. > > 2. LoongArch is different with other other architectures, two pages are > included in one TLB entry. If there is two consecutive page mapped and > memory access, there will page fault for the second memory access. Such > as: > addr1 =percpu_alloc(pagesize); > val1 = *(int *)addr1; > // With page table walk, addr1 is present and addr2 is pte_none > // TLB entry includes valid pte for addr1, invalid pte for addr2 > addr2 =percpu_alloc(pagesize); // will not flush tlb in first time > val2 = *(int *)addr2; > // With page table walk, addr1 is present and addr2 is present also > // TLB entry includes valid pte for addr1, invalid pte for addr2 > So there will be page fault when accessing address addr2 > > There there is the same problem with user address space. By the way, > there is HW prefetching technology, negative effective of HW prefetching > technology will be tlb added. So there is potential page fault if memory > is allocated and accessed in the first time. As discussed internally, there may be three problems related to speculative access in detail: 1) a load/store after set_pte() is prioritized before, which can be prevented by dbar, 2) a instruction fetch after set_pte() is prioritized before, which can be prevented by ibar, 3) the buddy tlb problem you described here, if I understand Ruiyang's explanation correctly this can only be prevented by the filter in do_page_fault().