On Wed, Nov 16, 2022 at 01:45:15PM +0300, Anatoly Pugachev wrote: > On Wed, Nov 16, 2022 at 11:49 AM hev <r@xxxxxx> wrote: > > > > Hello Peter, Hi, Hev, Thanks for letting me know. > > > > I see a random crash issue on the LoongArch system, that is caused by > > commit 0ccf7f1 ("mm/thp: carry over dirty bit when thp splits on > > pmd"). > > > > Now, the thing is already resolved. The root cause is arch's mkdirty > > is set hardware writable bit in unconditional. That breaks > > write-protect and then breaks COW. Could you help explain how that happened? I'm taking example of loongarch here: static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= (_PAGE_DIRTY | _PAGE_MODIFIED); return pte; } #define _PAGE_MODIFIED (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT) #define _PAGE_MODIFIED_SHIFT 9 #define _PAGE_DIRTY (_ULCAST_(1) << _PAGE_DIRTY_SHIFT) #define _PAGE_DIRTY_SHIFT 1 I don't see when write bit is set, which is bit 8 instead: #define _PAGE_WRITE (_ULCAST_(1) << _PAGE_WRITE_SHIFT) #define _PAGE_WRITE_SHIFT 8 According to loongarch spec: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#section-multi-level-page-table-structure-supported-by-page-walking Bits 1 & 8 match the spec D & W definitions. Bit 9 seems not defined but I didn't quickly spot how that's related to the write bit. > > > > Here is a simple and fast testcase (It may be helpful for sparc64): > > https://gist.github.com/heiher/72919fae6b53f04cac606a9631100506 > > (assertion: c sum == 0) > > Just tried on my sparc64 VM - fixed vs old (non-patched) kernels... > > fixed kernel (6.1.0-rc5) running ./a.out: > mator@ttip:~$ ./a.out > c sum: 0 > p sum: 35184372088832 > c sum: 0 > p sum: 35184372088832 > c sum: 0 > p sum: 35184372088832 > c sum: 0 > p sum: 35184372088832 > c sum: 0 > p sum: 35184372088832 > ... > > old (non-patched) kernel (6.1.0-rc4) : > mator@ttip:~$ ./a.out > c sum: 35150012350464 > p sum: 35184372088832 > c sum: 35150012350464 > p sum: 35184372088832 > ... Thanks for the quick run, Anatoly. Obviously I went the wrong way before on the code patching. It seems we have more chance fixing this. -- Peter Xu