Re: [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2008-08-04一的 20:10 +0900,Hisashi Hifumi写道:
> Hi
> 
> Dio write returns EIO when try_to_release_page fails because bh is
> still referenced.
> The patch 
> "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
> Author: Mingming Cao <cmm@xxxxxxxxxx>
> Date:   Fri Jul 25 01:46:22 2008 -0700
> 
>     jbd: fix race between free buffer and commit transaction
> " 
> was merged into 2.6.27-rc1, but I noticed that this patch is not enough
> to fix the race.
> I did fsstress test heavily to 2.6.27-rc1, and found that dio write still 
> sometimes got EIO through this test.

:(  thought we beat that race pretty hard already.T

Could you send me the fsstree command to reproduce the race?

> The patch above fixed race between freeing buffer(dio) and committing 
> transaction(jbd) but I discovered that there is another race, 
> freeing buffer(dio) and ext3/4_ordered_writepage.
> : background_writeout()
>      ->write_cache_pages()
>        ->ext3_ordered_writepage()
>      	   walk_page_buffers() <- take a bh ref
>  	   block_write_full_page() <- unlock_page
> 		: <- end_page_writeback
>                 : <- race! (dio write->try_to_release_page fails)
>       	   walk_page_buffers() <-release a bh ref
> 
> ext3_ordered_writepage holds bh ref and does unlock_page remaining 
> taking a bh ref, so this causes the race and failure of 
> try_to_release_page.
> 

I thought about this before, the race seems unlikely to me. Perhaps I
missed something, but DIO code already waiting for all the pending IO to
finish before calling try_to_release_page which eventually called
journal_try_to_free_buffers(). During this call, the inode mutx is hold
to prevent the new writer (buffered/DIO) to re-dirty the pages. If there
is background writeout happens when DIO is kicked in, DIO will wait for
all the pages writeback bit clear first. here is the stack

generic_file_aio_write()
  -> mutex_lock(&inode->i_mutex);
  -> __generic_file_aio_write_nolock()
     -> generic_file_direct_IO()
        ->filemap_write_and_wait()
           -> filemap_fdatawait()
              -> wait_on_page_writeback_range()
                                                (==== waiting for
pending IO to finish ====)
      ->invalidate_inode_pages2_range()
          ->invalidate_inode_pages2()
             ->try_to_releasepage()
                ->ext3_releasepage()
                    ->journal_try_to_free_buffers()

> Following patch fixes this race.
> Thanks.
> 
> Signed-off-by :Hisashi Hifumi <hifumi.hisashi@xxxxxxxxxxxxx>
> 
> diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.c linux-2.6.27-rc1/fs/jbd/transaction.c
> --- linux-2.6.27-rc1.org/fs/jbd/transaction.c	2008-07-29 19:28:47.000000000 +0900
> +++ linux-2.6.27-rc1/fs/jbd/transaction.c	2008-07-29 20:40:12.000000000 +0900
> @@ -1764,6 +1764,12 @@ int journal_try_to_free_buffers(journal_
>  	*/
>  	if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
>  		journal_wait_for_transaction_sync_data(journal);
> +
> +		bh = head;
> +		do {
> +			while (atomic_read(&bh->b_count))
> +				schedule();
> +		} while ((bh = bh->b_this_page) != head);
>  		ret = try_to_free_buffers(page);
>  	}
> 
> diff -Nrup linux-2.6.27-rc1.org/fs/jbd2/transaction.c linux-2.6.27-rc1/fs/jbd2/transaction.c
> --- linux-2.6.27-rc1.org/fs/jbd2/transaction.c	2008-07-29 19:28:47.000000000 +0900
> +++ linux-2.6.27-rc1/fs/jbd2/transaction.c	2008-07-29 20:56:42.000000000 +0900
> @@ -1583,6 +1583,12 @@ int jbd2_journal_try_to_free_buffers(jou
>  	*/
>  	if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
>  		jbd2_journal_wait_for_transaction_sync_data(journal);
> +
> +		bh = head;
> +		do {
> +			while (atomic_read(&bh->b_count))
> +				schedule();
> +		} while ((bh = bh->b_this_page) != head);
>  		ret = try_to_free_buffers(page);
>  	}
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux