Re: [PATCH 01/13] writeback: IO-less balance_dirty_pages()

"Ted Ts'o" <tytso@xxxxxxx> · Sun, 5 Dec 2010 21:42:31 -0500

On Mon, Dec 06, 2010 at 12:14:35AM +0800, Wu Fengguang wrote:
> 
> Ah I seem to find the root cause. See the attached graphs. Ext4 should
> be calling redirty_page_for_writepage() to redirty ~300MB pages on
> every ~10s. The redirties happen in big bursts, so not surprisingly
> the dd task's dirty weight will suddenly drop to 0.
> 
> It should be the same ext4 issue discussed here:
> 
>         http://www.spinics.net/lists/linux-fsdevel/msg39555.html

Yeah, unfortunately the fix suggested isn't the right one.

The right fix is going to involve making much more radical changes to
the ext4 write submission path, which is on my todo queue.  For now,
if people don't like these nasty writeback dynamics, my suggestion for
now is to mount the filesystem data=writeback.

This is basically the clean equivalent of the patch suggested by Feng
Tang in his e-mail referenced above.  Given that ext4 uses delayed
allocation, most of the time unwritten blocks are not allocated, and
so stale data isn't exposed.

The case which you're seeing here is where both the jbd2 data=order
forced writeback is colliding with the writeback thread, and
unfortunately, the forced writeback in the jbd2 layer is done in an
extremely inefficient manner.  So data=writeback is the workaround,
and unlike ext3, it's not a serious security leak.  It is possible for
some stale data to get exposed if you get unlucky when you crash,
though, so there is a potential for some security exposure.

The long-term solution to this problem is to rework the ext4 writeback
path so that we write the data blocks when they are newly allocated,
and then only update fs metadata once they are written.  As I said,
it's on my queue.  Until then, the only suggestion I can give folks is
data=writeback.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html