Re: WARNING in unaccount_page_cache_page (2)

Hillf Danton <hdanton@xxxxxxxx> · Tue, 14 Apr 2020 19:29:39 +0800

On Tue, 14 Apr 2020 09:23:40 +0200 Dmitry Vyukov wrote:
> 
> On Mon, Apr 13, 2020 at 10:32 PM Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, 13 Apr 2020 00:50:11 -0700 syzbot <syzbot+2854d22c7dd957a6519a@sy=
> zkaller.appspotmail.com> wrote:
> >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    ae46d2aa mm/gup: Let __get_user_pages_locked() return -=
> EIN..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=3D12b60343e00=
> 000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=3Dca75979eeeb=
> f06c2
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=3D2854d22c7dd95=
> 7a6519a
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the comm=
> it:
> > > Reported-by: syzbot+2854d22c7dd957a6519a@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > (cc's added)
> >
> > Looks like the loop backing device's pagecache still has a dirty page,
> > despite us having just run sync_blockdev().  It may well be a race of
> > some form - do we have any description of what the test is doing?
> 
> 
> Yes, it's probably loop related. And this is probably a very hard to
> trigger race.
> Below are the suspect programs that triggered this.
> This also happened on upstream before:
> https://syzkaller.appspot.com/bug?id=3D77543faae8aa91ae9993d8e0d34df41926b2=
> dc8f
> And also on ChromeOS 4.19 and one 4.15 tree. But in all cases the rate
> is very low and syzkaller was never able to reproduce this. So I would
> assume this is a race with an inconsistency window around few
> instructions. But I think it is real because over the past year it
> happened 14 times and reports are similar each time and the suspect
> programs are similar and there are no red flags in these crashes.


If its comment is still correct, is it making sense to lock inode?

/**
 * truncate_inode_pages - truncate *all* the pages from an offset
 * @mapping: mapping to truncate
 * @lstart: offset from which to truncate
 *
 * Called under (and serialised by) inode->i_mutex.
 *
 * Note: When this function returns, there can be a page in the process of
 * deletion (inside __delete_from_page_cache()) in the specified range.  Thus
 * mapping->nrpages can be non-zero when this function returns even after
 * truncation of the whole mapping.
 */

(BTW, a tree-wide cleanup of comment in terms of i_mutex looks needed.)

--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -83,7 +83,9 @@ void kill_bdev(struct block_device *bdev
 		return;
 
 	invalidate_bh_lrus();
+	inode_lock(bdev->bd_inode);
 	truncate_inode_pages(mapping, 0);
+	inode_unlock(bdev->bd_inode);
 }	
 EXPORT_SYMBOL(kill_bdev);