Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2019/9/18 21:27, Jan Kara wrote:
On Wed 18-09-19 21:09:00, yangerkun wrote:
On 2019/9/18 18:45, Jan Kara wrote:
On Tue 17-09-19 16:48:14, yangerkun wrote:
No need to wait when offset equals to 0. And it will trigger a bug since
the latter __ext4_journalled_invalidatepage can free the buffers but leave
page still dirty.

[   26.057508] ------------[ cut here ]------------
[   26.058531] kernel BUG at fs/ext4/inode.c:2134!
...
[   26.088130] Call trace:
[   26.088695]  ext4_writepage+0x914/0xb28
[   26.089541]  writeout.isra.4+0x1b4/0x2b8
[   26.090409]  move_to_new_page+0x3b0/0x568
[   26.091338]  __unmap_and_move+0x648/0x988
[   26.092241]  unmap_and_move+0x48c/0xbb8
[   26.093096]  migrate_pages+0x220/0xb28
[   26.093945]  kernel_mbind+0x828/0xa18
[   26.094791]  __arm64_sys_mbind+0xc8/0x138
[   26.095716]  el0_svc_common+0x190/0x490
[   26.096571]  el0_svc_handler+0x60/0xd0
[   26.097423]  el0_svc+0x8/0xc

Run below parallel can reproduce it easily(ext3):
void main()
{
          int fd, fd1, fd2, fd3, ret;
          void *addr;
          size_t length = 4096;
          int flags;
          off_t offset = 0;
          char *str = "12345";

          fd = open("a", O_RDWR | O_CREAT);
          assert(fd >= 0);

          ret = ftruncate(fd, length);
          assert(ret == 0);

          fd1 = open("a", O_RDWR | O_CREAT, -1);
          assert(fd1 >= 0);

          flags = 0xc00f;/*Journal data mode*/
          ret = ioctl(fd1, _IOW('f', 2, long), &flags);
          assert(ret == 0);

          fd2 = open("a", O_RDWR | O_CREAT);
          assert(fd2 >= 0);

          fd3 = open("a", O_TRUNC | O_NOATIME);
          assert(fd3 >= 0);

          addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset);

Ugh, these mmap flags look pretty bogus. Were they generated by some
fuzzer?
Yeah, generated by syzkaller.

          assert(addr != (void *)-1);
          memcpy(addr, str, 5);

Also the O_TRUNC open above will truncate "a" to 0 so the mapping is
actually beyond i_size and this memcpy should fail with SIGBUS. So I'm
surprised your test program gets up to mbind()...

We run the program parallel, sometimes will run as below:

reproduce1                         reproduce2

...                            |   ...
truncate to 4k                 |
change to journal data mode    |
                                |   memcpy(set page dirty)
truncate to 0:                 |
ext4_setattr:                  |
...                            |
ext4_wait_for_tail_page_commit |
                                |   mbind(trigger bug)
truncate_pagecache(clean dirty)|   ...
...                            |
Reproduce2 will mark page as dirty by memcpy, then mbind run between
ext4_wait_for_tail_page_commit and truncate_pagecache in ext4_setattr can
trigger the bug with page still be dirty but buffer head has been free.

Aha! Thanks for explanation. Makes sense. So I agree with your patch but we
also need to update the comment before the condition in
ext4_wait_for_tail_page_commit(). Something like:

If the page is fully truncated, we don't need to wait for any commit (and
we even should not as __ext4_journalled_invalidatepage() may strip all
buffers from the page but keep the page dirty which can then confuse e.g.
concurrent ext4_writepage() seeing dirty page without buffers). Also we
don't need to wait for any commit if all buffers in the page remain valid.
This is most beneficial for the common case of blocksize == PAGE_SIZE.

I will add this comment and reorganize the patch. Thanks a lot!


								Honza

---
   fs/ext4/inode.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 006b7a2070bf..a9943ae4f74d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
   	 * do. We do the check mainly to optimize the common PAGE_SIZE ==
   	 * blocksize case
   	 */
-	if (offset > PAGE_SIZE - i_blocksize(inode))
+	if (!offset || offset > PAGE_SIZE - i_blocksize(inode))
   		return;
   	while (1) {
   		page = find_lock_page(inode->i_mapping,




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux