On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote: > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote: > > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! > > > I don't know, but I'm afraid it has not fixed linux-next breakage of > > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! > > > > > > Please try something like > > > mount -o remount,huge=always /dev/shm > > > cp /dev/zero /dev/shm > > > > > > Writing soon crashes in find_lock_entry(), looking up offset 0x201 > > > but getting the page for offset 0x3c1 instead. > > > > Hmm. I don't see a crash while running that command, > > Thanks for looking. > > It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) > in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y > on this occasion? Or you don't think of an oops as a kernel crash, > and didn't notice it in dmesg? I see now that I've arranged for oops > to crash, since I don't like to miss them myself; but it is a very > clean oops, no locks held, so can just kill the process and continue. Usually I run with that turned on, but somehow in my recent messing with my test system, that got turned off. Once I turned it back on, it spots the bug instantly. > Or is there something more mysterious stopping it from showing up for > you? It's repeatable for me. When not crashing, that "cp" should fill > up about half of RAM before it hits the implicit tmpfs volume limit; > but I am assuming a not entirely fragmented machine - it does need > to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE(). I tried that too, before noticing that DEBUG_VM was off; raised my test VM's memory from 2GB to 8GB. > Are you sure that those pages are free, rather than most of them tails > of one of the two compound pages involved? I think it's the same in your > rewrite of struct page, the compound_head field (lru.next), with its low > bit set, were how to recognize a tail page. Yes, PageTail was set, and so was TAIL_MAPPING (0xdead0000000000400). What was going on was the first 2MB page was being stored at indices 0-511, then the second 2MB page was being stored at indices 64-575 instead of 512-1023. I figured out a fix and pushed it to the 'ida' branch in git://git.infradead.org/users/willy/linux-dax.git It won't be in linux-next tomorrow because the nvdimm people have just dumped a pile of patches into their tree that conflict with the XArray-DAX rewrite, so Stephen has pulled the XArray tree out of linux-next temporarily. I didn't have time to sort out the merge conflict today because I judged your bug report more important.