On Thu, Oct 29, 2020 at 04:49:39PM -0400, Zi Yan wrote: > On 29 Oct 2020, at 15:33, Matthew Wilcox (Oracle) wrote: > > > We currently store order-N THPs as 2^N consecutive entries. While this > > consumes rather more memory than necessary, it also turns out to be buggy. > > A writeback operation which starts in the middle of a dirty THP will not > > notice as the dirty bit is only set on the head index. With multi-index > > entries, the dirty bit will be found no matter where in the THP the > > iteration starts. > > A multi-index entry can point to a THP with any size and the code relies > on thp_last_tail() to check whether it has finished processing the page > pointed by the entry. Is it how this change works? Maybe I need to do a better explanation here. Let me try again ... Consider an order-2 page (at address p) at index 4. Before this change, the node in the XArray contains: 4: p 5: p 6: p 7: p After this change, it contains: 4: p 5: sibling(4) 6: sibling(4) 7: sibling(4) When we mark page p as dirty, we set a bit on entry 4, since that's the head page. Now we try to fsync pages 5-19, we start the lookup at index 5. Before this patch, the pagecache knows that p is a head page, but the XArray doesn't. So when it looks at entry 5, it sees a normal pointer and no mark on it -- the XArray doesn't get to interpret the contents of the pointers stored in it. After this patch, we tell the XArray that indices 4-7 are a single entry, so the marked iteration actually loads the entry at 5, sees it's a sibling of 4, sees that 4 is marked dirty and returns p.