On Sun, Aug 18, 2019 at 2:31 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 15, 2019 at 01:09:11AM +1000, Monthero Ronald wrote: > > The patch checks for this condition of NULL pointer for the buffer_head returned from page_buffers() > > and also a check placed within the list traversal loop for next buffer_head structs. > > > > crash scenario: > > The buffer_head returned from page_buffers() is not checked in block_invalidatepage_range function. > > The struct buffer_head* pointer returned by page_buffers(page) was 0x0, although this page had its > > private flag PG_private bit set and was expected to have buffer_head structs attached.The NULL pointer > > buffer_head was dereferenced in block_invalidatepage_range function at bh->b_size, where bh returned by > > page_buffers(page) was 0x0. > > > > The stack frames were truncate_inode_page() => do_invalidatepage_range() => xfs_vm_invalidatepage() => > > [exception RIP: block_invalidatepage_range+132] > > > > The inode for truncate in this case was valid and had proper inode.i_state = 0x20 - FREEING and had > > a valid mapped address space to xfs. And the struct page in context of block_invalidatepage_range() > > had its page flag PG_private set but the page.private was 0x0. So page_buffers(page) returned 0x0 > > and hence the crash. > > This patch performs NULL pointer check for returned buffer_head. Applies to 3.16 and later kernels. > > ... and adds BUG_ON() for that. The only real difference from an oops is > that it's a bit easier to recognize. Which may or may not be a good > debugging strategy, but what's the point of having it in -stable? Or > anywhere other than the build on the boxen you are testing on... > > It doesn't fix the underlying bug. It doesn't tell where the problem > is. It's definitely *not* a way to fix any bugs. And while we > are at it, the stuff in -stable ought to be backports from mainline. > > Can you reproduce your crashes on mainline? No I don't have a reproducer. This was an abrupt crash, where the page has its private flag set but the buffer_head returned by page_buffers( ) was 0x0. Having the BUG_ON check for the returned buffer_head from page_buffers( ) would help to to see the BUG line readily than to analyze the vmcore and check the fault condition why kernel panicked. Two cents here, but open to thoughts. ( This crash to reproduce is not trivial, nevertheless will attempt to check if there are any mm tests or page mapping tests that can induce this condition )