On Thu, Dec 16, 2010 at 02:48:29AM +0800, Richard Kennedy wrote: > On Tue, 2010-12-14 at 21:59 +0800, Wu Fengguang wrote: > > On Tue, Dec 14, 2010 at 09:37:34PM +0800, Richard Kennedy wrote: > > > Hi Fengguang, > > > > > > I've been running my test set on your v3 series and generally it's > > > giving good results in line with the mainline kernel, with much less > > > variability and lower standard deviation of the results so it is much > > > more repeatable. > > > > Glad to hear that, and thank you very much for trying it out! > > > > > However, it doesn't seem to be honouring the background_dirty_threshold. > > > > > The attached graph is from a simple fio write test of 400Mb on ext4. > > > All dirty pages are completely written in 15 seconds, but I expect to > > > see up to background_dirty_threshold pages staying dirty until the 30 > > > second background task writes them out. So it is much too eager to write > > > back dirty pages. > > > > This is interesting, and seems easy to root cause. When testing v4, > > would you help collect the following trace events? > > > > echo 1 > /debug/tracing/events/writeback/balance_dirty_pages/enable > > echo 1 > /debug/tracing/events/writeback/balance_dirty_state/enable > > echo 1 > /debug/tracing/events/writeback/writeback_single_inode/enable > > > > They'll have good opportunity to disclose the bug. > > > > > As to the ramp up time, when writing to 2 disks at the same time I see > > > the per_bdi_threshold taking up to 20 seconds to converge on a steady > > > value after one of the write stops. So I think this could be speeded up > > > even more, at least on my setup. > > > > I have the roughly same ramp up time on the 1-disk 3GB mem test: > > > > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/tests/3G/ext4-1dd-1M-8p-2952M-2.6.37-rc5+-2010-12-09-00-37/dirty-pages.png > > > > Given that it's the typical desktop, it does seem reasonable to speed > > it up further. > > > > > I am just about to start testing v4 & will report anything interesting. > > > > Thanks! > > > > Fengguang > > I just mailed the trace log to Fengguang, it is a bit big to post to > this list. If anyone wants it, let me know and I'll mail to them > directly. > > I'm also seeing a write stall in some of my tests. When writing 400Mb > after about 6 seconds I'm see a few seconds when there are no reported > sectors written to sda & there are no pages under writeback although > there are lots of dirty pages. ( the graph I sent previously shows this > stall as well ) I managed to reproduce your workload, see the attached graphs. They represents two runs of the following fio job. Obviously the results are very reproducible. [zero] size=400m rw=write pre_read=1 ioengine=mmap Here is the trace data for the first graph. I'll explain how every single write is triggered. Vanilla kernels should have the same behaviors. background threshold exceeded, so background flush is started ------------------------------------------------------------- flush-8:0-2662 [005] 18.759459: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=544 wrote=16385 to_write=-1 index=1 flush-8:0-2662 [000] 19.941272: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=1732 wrote=16385 to_write=-1 index=16386 flush-8:0-2662 [000] 20.162497: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=1952 wrote=4097 to_write=-1 index=32771 fio completes data population and does something like fsync() Note that the dirty age is not reset by fsync(). ------------------------------------------------------------- <...>-2637 [000] 25.364145: fdatawrite_range: fio: bdi=8:0 ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES start=0 end=9223372036854775807 sync=1 wrote=65533 skipped=0 <...>-2637 [004] 26.492765: fdatawrite_range: fio: bdi=8:0 ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES start=0 end=9223372036854775807 sync=0 wrote=0 skipped=0 fio starts "rw=write", and triggered background flush when background threshold is exceeded ---------------------------------------------------------- flush-8:0-2662 [000] 33.277084: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_PAGES age=15112 wrote=16385 to_write=-1 index=1 flush-8:0-2662 [000] 34.486721: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=16324 wrote=16385 to_write=-1 index=16386 flush-8:0-2662 [000] 34.942939: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=16784 wrote=8193 to_write=-1 index=32771 5 seconds later, kupdate flush starts to work on expired inodes in b_io *as well as* whatever inode that is already in the b_more_io list. Unfortunately inode 131 was moved to b_more_io in the previous background flush and has been sit there ever since. --------------------------------------------------------------------- flush-8:0-2662 [004] 39.951920: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=21808 wrote=16385 to_write=-1 index=40964 flush-8:0-2662 [000] 40.784427: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=22644 wrote=16385 to_write=-1 index=57349 flush-8:0-2662 [000] 41.840671: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=23704 wrote=8193 to_write=-1 index=73734 flush-8:0-2662 [004] 42.845739: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=24712 wrote=8193 to_write=-1 index=81927 flush-8:0-2662 [004] 43.309379: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC|I_DIRTY_PAGES age=25180 wrote=8193 to_write=-1 index=90120 flush-8:0-2662 [000] 43.547443: writeback_single_inode: bdi 8:0: ino=131 state=I_DIRTY_SYNC age=25416 wrote=4088 to_write=12296 index=0 This may be a bit surprising, but should not be a big problem. After all, the vm.dirty_expire_centisecs=30s merely says that dirty inodes will be put to IO _within_ 35s. The kernel still have some freedom to start writeback earlier than the deadline, or even miss the deadline in the case of too busy IO. Thanks, Fengguang
Attachment:
global-dirty-state.png
Description: PNG image
Attachment:
global-dirty-state.png
Description: PNG image