Re: [PATCH] jbd jbd2: fix diowritereturningEIOwhentry_to_release_page fails

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Tue, 19 Aug 2008 00:16:51 -0700

On Tue, 19 Aug 2008 16:03:45 +0900 Hisashi Hifumi <hifumi.hisashi@xxxxxxxxxxxxx> wrote:

> 
> At 21:59 08/08/13, Chris Mason wrote:
> >On Wed, 2008-08-13 at 12:16 +0200, Jan Kara wrote:
> >
> >> > With that said, I don't have strong feelings against falling back to
> >> > buffered IO when the invalidate fails.  Maybe Zach remembers something I
> >> > don't?
> >>   I don't have a strong opinion either. Falling back to buffered writes is
> >> simpler at least for ext3/ext4 because properly synchronizing against
> >> writepage() call does not seem to have a nice solution either in
> >> do_launder_page() or in releasepage(). OTOH is hides the fact the invalidate
> >> is failing and so if we screw up something in future and it fails often, it
> >> might be hard to notice / track down the performance penalty.
> >
> >In general, these races don't happen often, and when they do it is
> >because someone is mixing page cache and O_DIRECT io to the same file.
> >That is explicitly outside the main use case of O_DIRECT.
> >
> >So, I'd rather see us slow down O_DIRECT in the mixed use case than have
> >big impacts in complexity or speed to other parts of the kernel.  If
> >falling back avoids problems in some filesystems or avoids clearing the
> >uptodate bit unexpectedly, I'd much rather take the fallback patch.
> >
> >-chris
> 
> Hi Andrew.
> I think we don't have strong feelings against falling back to buffered writes to
> fix the direct-io -EIO problem.
> 
> Please review my patch.
> 

umm, what problem does it solve?

> 
> diff -Nrup linux-2.6.27-rc3.org/mm/filemap.c linux-2.6.27-rc3/mm/filemap.c
> --- linux-2.6.27-rc3.org/mm/filemap.c	2008-08-13 13:48:47.000000000 +0900
> +++ linux-2.6.27-rc3/mm/filemap.c	2008-08-19 15:45:31.000000000 +0900
> @@ -2129,13 +2129,20 @@ generic_file_direct_write(struct kiocb *
>  	 * After a write we want buffered reads to be sure to go to disk to get
>  	 * the new data.  We invalidate clean cached page from the region we're
>  	 * about to write.  We do this *before* the write so that we can return
> -	 * -EIO without clobbering -EIOCBQUEUED from ->direct_IO().
> +	 * without clobbering -EIOCBQUEUED from ->direct_IO().
>  	 */
>  	if (mapping->nrpages) {
>  		written = invalidate_inode_pages2_range(mapping,
>  					pos >> PAGE_CACHE_SHIFT, end);
> -		if (written)
> +		/*
> +		 * If a page can not be invalidated, return 0 to fall back
> +		 * to buffered write.
> +		 */
> +		if (written) {
> +			if (written == -EBUSY)
> +				return 0;
>  			goto out;
> +		}
>  	}
>  
>  	written = mapping->a_ops->direct_IO(WRITE, iocb, iov, pos, *nr_segs);
> diff -Nrup linux-2.6.27-rc3.org/mm/truncate.c linux-2.6.27-rc3/mm/truncate.c
> --- linux-2.6.27-rc3.org/mm/truncate.c	2008-08-13 13:48:48.000000000 +0900
> +++ linux-2.6.27-rc3/mm/truncate.c	2008-08-19 12:10:46.000000000 +0900
> @@ -380,7 +380,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are unmapped prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EBUSY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>  				  pgoff_t start, pgoff_t end)
> @@ -440,7 +440,7 @@ int invalidate_inode_pages2_range(struct
>  			ret2 = do_launder_page(mapping, page);
>  			if (ret2 == 0) {
>  				if (!invalidate_complete_page2(mapping, page))
> -					ret2 = -EIO;
> +					ret2 = -EBUSY;
>  			}
>  			if (ret2 < 0)
>  				ret = ret2;

If I recall correctly, we had a problem with pages which are pinned by
an ext3 transaction, and those pages weren't releaseable for direct-io,
and this caused some problem?

I think falling back to buffered writes is always a safe course, but
it'd be nice to have a full description of the change, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html