On Thu, Nov 17, 2022 at 10:29:57AM +0800, hev wrote: > Hi Peter, Hi, Hev, > > On Thu, Nov 17, 2022 at 12:25 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Wed, Nov 16, 2022 at 01:45:15PM +0300, Anatoly Pugachev wrote: > > > On Wed, Nov 16, 2022 at 11:49 AM hev <r@xxxxxx> wrote: > > > > > > > > Hello Peter, > > > > Hi, Hev, > > > > Thanks for letting me know. > > > > > > > > > > I see a random crash issue on the LoongArch system, that is caused by > > > > commit 0ccf7f1 ("mm/thp: carry over dirty bit when thp splits on > > > > pmd"). > > > > > > > > Now, the thing is already resolved. The root cause is arch's mkdirty > > > > is set hardware writable bit in unconditional. That breaks > > > > write-protect and then breaks COW. > > > > Could you help explain how that happened? > > > > I'm taking example of loongarch here: > > > > static inline pte_t pte_mkdirty(pte_t pte) > > { > > pte_val(pte) |= (_PAGE_DIRTY | _PAGE_MODIFIED); > > return pte; > > } > > > > #define _PAGE_MODIFIED (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT) > > #define _PAGE_MODIFIED_SHIFT 9 > > _PAGE_MODIFIED is a software dirty bit > > > #define _PAGE_DIRTY (_ULCAST_(1) << _PAGE_DIRTY_SHIFT) > > #define _PAGE_DIRTY_SHIFT 1 > > _PAGE_DIRTY is a hardware writable bit (bad naming), meaning that mmu > allows write memory without any exception raised. (I just missed this email before I reply to the other one, I should have read this one first..) I see. This surprises me a bit, as I can't quickly tell how it'll always work with the generic mm code. Say, is there a quick answer on why _PAGE_DIRTY is set here rather than pte_mkwrite()? Because AFAIU that's where the mm wants to grant write permission to a page table entry as the API, no? > > > > > I don't see when write bit is set, which is bit 8 instead: > > > > #define _PAGE_WRITE (_ULCAST_(1) << _PAGE_WRITE_SHIFT) > > #define _PAGE_WRITE_SHIFT 8 > > _PAGE_WRITE is a software writable bit (not hardware). > > As David said, In __split_huge_pmd_locked, the VMA does not include VM_WRITE, > > entry = maybe_mkwrite(entry, vma); > > so the pte does not include software writable bit (_PAGE_WRITE). Are you sure? In your test case you mapped with RW, IIUC it means even after the fork() VM_WRITE is set on both sides? But I agree the write bit is not set, not because !VM_WRITE, but because we take care of that explicitly to make sure pte has the same write bit as pmd: (pmd used to be wr-protected due to fork()) write = pmd_write(old_pmd); ... (then when split pte shouldn't have write bit too) if (!write) entry = pte_wrprotect(entry); > > and the dirty is true, > > if (dirty) > entry = pte_mkdirty(entry); > > so the incorrect arch's pte_mkdirty set hardware writable > bit(_PAGE_DIRTY) in unconditional for read-only pages. True, that does also apply to sparc64 pte_mkdirty() with _PAGE_W_4[UV]. I should have noticed earlier that its comment told me that's a write bit already.. #define _PAGE_W_4U _AC(0x0000000000000002,UL) /* Writable */ Thanks, -- Peter Xu