On Wed, Jun 10, 2020 at 01:12:54PM -0700, Matthew Wilcox wrote: > Another fortnight, another dump of my current large pages work. The generic/127 test has pointed out to me that range writeback is broken by this patchset. Here's how (may not be exactly what's going on, but it's close): page cache allocates an order-2 page covering indices 40-43. bytes are written, page is dirtied test then calls fallocate(FALLOC_FL_COLLAPSE_RANGE) for a range which starts in page 41. XFS calls filemap_write_and_wait_range() which calls __filemap_fdatawrite_range() which calls do_writepages() which calls iomap_writepages() which calls write_cache_pages() which calls tag_pages_for_writeback() which calls xas_for_each_marked() starting at page 41. Which doesn't find page 41 because when we dirtied pages 40-43, we only marked index 40 as being dirty. Annoyingly, the XArray actually handles this just fine ... if we were using multi-order entries, we'd find it. But we're still storing 2^N entries for an order N page. I can see two ways to fix this. One is to bite the bullet and do the conversion of the page cache to use multi-order entries. The second is to set and clear the marks on all entries. I'm concerned about the performance of the latter solution. Not so bad for order-2 pages, but for an order-9 page we have 520 bits to set, spread over 9 non-consecutive cachelines. Also, I'm unenthusiastic about writing code that I want to throw away as quickly as possible. So unless somebody has a really good alternative idea, I'm going to convert the page cache over to multi-order entries. This will have several positive effects: - Get DAX and regular page cache using the xarray in a more similar way - Saves about 4.5kB of memory for every 2MB page in tmpfs/shmem - Prep work for converting hugetlbfs to use the page cache the same way as tmpfs