----- Original Message ---- > From: Wu Fengguang <fengguang.wu@xxxxxxxxx> > To: Martin Knoblauch <spamtrap@xxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>; "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>; "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>; Peter Staubach <staubach@xxxxxxxxxx>; linux-fsdevel@xxxxxxxxxxxxxxx > Sent: Tue, November 10, 2009 2:08:18 PM > Subject: Re: Likley stupid question on "throttle_vm_writeout" > > On Tue, Nov 10, 2009 at 08:01:47PM +0800, Martin Knoblauch wrote: > > ----- Original Message ---- > > > > > From: Wu Fengguang > > > To: Peter Zijlstra > > > Cc: Martin Knoblauch ; linux-kernel@xxxxxxxxxxxxxxx > > > Sent: Tue, November 10, 2009 3:08:58 AM > > > Subject: Re: Likley stupid question on "throttle_vm_writeout" > > > > > > On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote: > > > > On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote: > > > > > Hi, (please CC me on replies) > > > > > > > > > > I have a likely stupid question on the function "throttle_vm_writeout". > > > > Looking at the code I find: > > > > > > > > > > if (global_page_state(NR_UNSTABLE_NFS) + > > > > > global_page_state(NR_WRITEBACK) <= dirty_thresh) > > > > > break; > > > > > congestion_wait(WRITE, HZ/10); > > > > > > > > > > Shouldn't the NR_FILE_DIRTY pages be considered as well? > > > > > > > > Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-) > > > > > > > > The intent of throttle_vm_writeout() is to limit the total pages in > > > > writeout and to wait for them to go-away. > > > > > > Like this: > > > > > > vmscan fast => large NR_WRITEBACK => throttle vmscan based on it > > > > > > > Everybody hates the function, nobody managed to actually come up with > > > > anything better. > > > > > > btw, here is another reason to limit NR_WRITEBACK: I saw many > > > throttle_vm_writeout() waits if there is no wait queue to limit > > > NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK > > > is _not_ caused by fast vmscan.. > > > > > > > That is exactely what made me look again into the code. My observation is > that when doing something like: > > > > dd if=/dev/zero of=fast-local-disk bs=1M count=15000 > > > > most of the "dirty" pages are in NR_FILE_DIRTY with some relatively small > amount (10% or so) in NR_WRITEBACK. If I do: > > > > dd if=/dev/zero of=some-nfs-mount bs=1M count=15000 > > > > NR_WRITEBACK almost immediatelly goes up to dirty_ratio, with > > NR_UNSTABLE_NFS small. Over time NR_UNSTABLE_NFS grows, but is > > always lower than NR_WRITEBACK (maybe 40/60). > > This is interesting, though I don't see explicit NFS code to limit > NR_UNSTABLE_NFS. Maybe there are some implicit rules. > > > But don't ask what happens if I do both in parallel.... The local > > IO really slows to a crawl and sometimes the system just becomes > > very unresponsive. Have we heard that before? :-) > > You may be the first reporter as far as I can tell :) > Oh come on :-) I (and others) have reported bad writeout behaviour since years. But maybe not in the combination of local and NFS I/O. > > Somehow I have the impression that NFS writeout is able to > > absolutely dominate the dirty pages to an extent that the system is > > unusable. > > This is why I want to limit NR_WRITEBACK for NFS: > > [PATCH] NFS: introduce writeback wait queue > http://lkml.org/lkml/2009/10/3/198 > Thanks. I will have a look. Is 2.6.32.x OK for testing? Cheers Martin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html