On Thu, Mar 03, 2011 at 02:45:24PM +0800, Wu, Fengguang wrote: > balance_dirty_pages() has been using a very simple and robust threshold > based throttle scheme. It automatically limits the dirty rate down, > however in a very bumpy way that constantly block the dirtier tasks for > hundreds of milliseconds on a local ext4. To get an idea of what exactly is going on in the current kernel, I back ported the balance_dirty_pages and global_page_state trace events to 2.6.38-rc7 and run the same test cases. The resulted graphs are pretty striking. In the worst NFS cases, the pause time frequently go up to 20-30 seconds, and the dirty progress is rather bumpy. 1-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-14/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-14/global_dirtied_written.png 8-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/NFS/nfs-8dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-26/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/NFS/nfs-8dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-26/balance_dirty_pages-task-bw.png The writes to USB key starts with a long 30 seconds pause, followed by many ~2 seconds long pauses for ext4. XFS is better; btrfs performs the best, however can still have 7s and 2s long delays. http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/1UKEY+1HDD-3G/ext4-1dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-34/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/1UKEY+1HDD-3G/xfs-1dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-07-23-56/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/1UKEY+1HDD-3G/btrfs-1dd-1M-8p-2945M-20%25-2.6.38-rc7+-2011-03-08-00-14/balance_dirty_pages-pause.png For the normal writes to HDD, ext4 has some >300ms pause times in 1-dd case, >600ms for 2-dd case, and >2s for 8-dd case. The pause time roughly deteriorates proportionally with the number of concurrent dd tasks. http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/ext4-1dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-15/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/ext4-2dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-22/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/ext4-8dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-30/balance_dirty_pages-pause.png XFS performs similarly http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/xfs-8dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-08/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/xfs-8dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-08/balance_dirty_pages-task-bw.png btrfs is better, typically has 1-2s max pause time in 8-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/btrfs-8dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-21-48/balance_dirty_pages-pause.png The long pause times will obviously ruin user experiences. It may also hurt performance. For example, if the dirtier is a simple "cp" or "scp", the long pause time will break the readahead pipeline or the network pipeline, leading to moments of underutilized disk/network bandwidth. Comparing to the above graphs, this patchset is able to keep latency under control (less than the configured 200ms max pause time) in all known cases, whether it be 1-dd or 1000-dd, on local file systems, over NFS or on USB key. 8-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/ext4-8dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-22-30/balance_dirty_pages-task-bw.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G/xfs-8dd-1M-8p-3927M-20%25-2.6.38-rc6-dt6+-2011-02-27-23-18/balance_dirty_pages-task-bw.png 128-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G/xfs-128dd-1M-8p-3927M-20%25-2.6.38-rc6-dt6+-2011-02-27-23-25/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G/xfs-128dd-1M-8p-3927M-20%25-2.6.38-rc6-dt6+-2011-02-27-23-25/balance_dirty_pages-task-bw.png 1000-dd case http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/10SSD-RAID0-64G/xfs-1000dd-1M-64p-64288M-20%25-2.6.38-rc6-dt6+-2011-02-28-10-40/balance_dirty_pages-pause.png UKEY http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/1UKEY+1HDD-3G/ext4-1dd-1M-8p-2975M-20%25-2.6.38-rc6-dt6+-2011-02-28-20-21/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/1UKEY+1HDD-3G/xfs-4dd-1M-8p-2945M-20%25-2.6.38-rc5-dt6+-2011-02-22-09-27/balance_dirty_pages-pause.png NFS http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/balance_dirty_pages-pause.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-8dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-22/balance_dirty_pages-pause.png Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html