On Thu, Apr 21, 2022 at 09:21:34PM +0100, Matthew Wilcox wrote: > I wish I knew which 'index' we were looking up. I'll try reproducing it > locally so I can print that out too. I can't reproduce it locally because the OOM killer says I don't have enough RAM. That's with giving 4GB to the VM. If I give more than 4GB to the VM, my laptop is insufficiently studly, and the host OOM killer takes out qemu instead ;-P > My suspicion is that there's a race where the folio is split during the > lookup, and the bug is really in mapping_get_entry(). The folio->index > is weird though; if this was the explanation, I'd expect it to find a > page at a multiple of 512 or at least a multiple of 64. I think I have an explanation (from thinking really hard, rather than testing). Before we call xas_split(), the tree looks like this: node (shift=6) -> page (index 0) -> sibling of 0 -> sibling of 0 -> sibling of 0 -> sibling of 0 -> sibling of 0 -> sibling of 0 -> sibling of 0 -> page (index 0x200) -> sibling of 8 -> sibling of 8 -> sibling of 8 -> sibling of 8 -> sibling of 8 -> sibling of 8 -> sibling of 8 -> sibling of 8 Then we split the page at index 0x200. Simultaneously, we try to load the page at index 0x274 (or 2b4 or 2f4 or ... 3f4). The load picks up the sibling entry at offset 9 (0x274 >> 6), which says to refer to the entry at offset 8. But by the time it gets the entry at offset 8, the split has replaced the compound page at index 0x200 with a node that points to pages at indices 0x200-0x23f. Solving it on the split side is possible, but I think it's easier to solve on the load side. I have a patch, it seems to work; let's see what syzbot thinks of it: #syz test: git://git.infradead.org/users/willy/xarray.git main