On Mon, Dec 06, 2010 at 12:14:35AM +0800, Wu Fengguang wrote: > > Ah I seem to find the root cause. See the attached graphs. Ext4 should > be calling redirty_page_for_writepage() to redirty ~300MB pages on > every ~10s. The redirties happen in big bursts, so not surprisingly > the dd task's dirty weight will suddenly drop to 0. > > It should be the same ext4 issue discussed here: > > http://www.spinics.net/lists/linux-fsdevel/msg39555.html Yeah, unfortunately the fix suggested isn't the right one. The right fix is going to involve making much more radical changes to the ext4 write submission path, which is on my todo queue. For now, if people don't like these nasty writeback dynamics, my suggestion for now is to mount the filesystem data=writeback. This is basically the clean equivalent of the patch suggested by Feng Tang in his e-mail referenced above. Given that ext4 uses delayed allocation, most of the time unwritten blocks are not allocated, and so stale data isn't exposed. The case which you're seeing here is where both the jbd2 data=order forced writeback is colliding with the writeback thread, and unfortunately, the forced writeback in the jbd2 layer is done in an extremely inefficient manner. So data=writeback is the workaround, and unlike ext3, it's not a serious security leak. It is possible for some stale data to get exposed if you get unlucky when you crash, though, so there is a potential for some security exposure. The long-term solution to this problem is to rework the ext4 writeback path so that we write the data blocks when they are newly allocated, and then only update fs metadata once they are written. As I said, it's on my queue. Until then, the only suggestion I can give folks is data=writeback. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html