Hi, Peter, On Fri, Nov 18, 2022 at 2:29 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, Nov 17, 2022 at 10:29:57AM +0800, hev wrote: > > Hi Peter, > > Hi, Hev, > > > > > On Thu, Nov 17, 2022 at 12:25 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > > On Wed, Nov 16, 2022 at 01:45:15PM +0300, Anatoly Pugachev wrote: > > > > On Wed, Nov 16, 2022 at 11:49 AM hev <r@xxxxxx> wrote: > > > > > > > > > > Hello Peter, > > > > > > Hi, Hev, > > > > > > Thanks for letting me know. > > > > > > > > > > > > > I see a random crash issue on the LoongArch system, that is caused by > > > > > commit 0ccf7f1 ("mm/thp: carry over dirty bit when thp splits on > > > > > pmd"). > > > > > > > > > > Now, the thing is already resolved. The root cause is arch's mkdirty > > > > > is set hardware writable bit in unconditional. That breaks > > > > > write-protect and then breaks COW. > > > > > > Could you help explain how that happened? > > > > > > I'm taking example of loongarch here: > > > > > > static inline pte_t pte_mkdirty(pte_t pte) > > > { > > > pte_val(pte) |= (_PAGE_DIRTY | _PAGE_MODIFIED); > > > return pte; > > > } > > > > > > #define _PAGE_MODIFIED (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT) > > > #define _PAGE_MODIFIED_SHIFT 9 > > > > _PAGE_MODIFIED is a software dirty bit > > > > > #define _PAGE_DIRTY (_ULCAST_(1) << _PAGE_DIRTY_SHIFT) > > > #define _PAGE_DIRTY_SHIFT 1 > > > > _PAGE_DIRTY is a hardware writable bit (bad naming), meaning that mmu > > allows write memory without any exception raised. > > (I just missed this email before I reply to the other one, I should have > read this one first..) > > I see. This surprises me a bit, as I can't quickly tell how it'll always > work with the generic mm code. > > Say, is there a quick answer on why _PAGE_DIRTY is set here rather than > pte_mkwrite()? Because AFAIU that's where the mm wants to grant write > permission to a page table entry as the API, no? > > > > > > > > > I don't see when write bit is set, which is bit 8 instead: > > > > > > #define _PAGE_WRITE (_ULCAST_(1) << _PAGE_WRITE_SHIFT) > > > #define _PAGE_WRITE_SHIFT 8 > > > > _PAGE_WRITE is a software writable bit (not hardware). > > > > As David said, In __split_huge_pmd_locked, the VMA does not include VM_WRITE, > > > > entry = maybe_mkwrite(entry, vma); > > > > so the pte does not include software writable bit (_PAGE_WRITE). > > Are you sure? In your test case you mapped with RW, IIUC it means even > after the fork() VM_WRITE is set on both sides? Sorry, I was wrong. In this case, both VMAs are writable, the pte's writable bit is cleared by pte_wrprotect. So if pte_mkdirty sets hardware writable bit unconditionally, then there will be no way to catch writes to implement COW. I will try to explain how it works about pte write, dirty and write-protect on LoongArch in the LoongArch mailing-list. Regards, Ray > > But I agree the write bit is not set, not because !VM_WRITE, but because we > take care of that explicitly to make sure pte has the same write bit as pmd: > > (pmd used to be wr-protected due to fork()) > write = pmd_write(old_pmd); > ... > > (then when split pte shouldn't have write bit too) > if (!write) > entry = pte_wrprotect(entry); > > > > > and the dirty is true, > > > > if (dirty) > > entry = pte_mkdirty(entry); > > > > so the incorrect arch's pte_mkdirty set hardware writable > > bit(_PAGE_DIRTY) in unconditional for read-only pages. > > True, that does also apply to sparc64 pte_mkdirty() with _PAGE_W_4[UV]. I > should have noticed earlier that its comment told me that's a write bit > already.. > > #define _PAGE_W_4U _AC(0x0000000000000002,UL) /* Writable */ > > Thanks, > > -- > Peter Xu >