On Wed, Oct 07, 2009 at 04:53:20PM +0800, Peter Zijlstra wrote: > On Wed, 2009-10-07 at 15:38 +0800, Wu Fengguang wrote: > > plain text document attachment (writeback-nfs-request-queue.patch) > > The generic writeback routines are departing from congestion_wait() > > in preferance of get_request_wait(), aka. waiting on the block queues. > > > > Introduce the missing writeback wait queue for NFS, otherwise its > > writeback pages may grow out of control. > > > > In perticular, balance_dirty_pages() will exit after it pushes > > write_chunk pages into the PG_writeback page pool, _OR_ when the > > background writeback work quits. The latter is new behavior, and could > > not only quit (normally) after below background threshold, but also > > quit when it finds _zero_ dirty pages to write. The latter case gives > > rise to the number of PG_writeback pages if it is not explicitly limited. > > > > CC: Jens Axboe <jens.axboe@xxxxxxxxxx> > > CC: Chris Mason <chris.mason@xxxxxxxxxx> > > CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > > CC: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > > --- > > > > The wait time and network throughput varies a lot! this is a major problem. > > This means nfs_end_page_writeback() is not called smoothly over time, > > even when there are plenty of PG_writeback pages on the client side. > > Could that be some ack batching from the nfs protocol/server? Yes possibly. Another possibility is some works in the nfsiod workqueue takes up a long time. > > +static void nfs_wakeup_congested(long nr, long limit, > > + struct backing_dev_info *bdi, > > + wait_queue_head_t *wqh) > > +{ > > + if (nr < 2*limit - min(limit/8, NFS_WAIT_PAGES)) { > > + if (test_bit(BDI_sync_congested, &bdi->state)) > > + clear_bdi_congested(bdi, BLK_RW_SYNC); > > + if (waitqueue_active(&wqh[BLK_RW_SYNC])) { > > + smp_mb__after_clear_bit(); > > + wake_up(&wqh[BLK_RW_SYNC]); > > + } > > + } > > + if (nr < limit - min(limit/8, NFS_WAIT_PAGES)) { > > + if (test_bit(BDI_async_congested, &bdi->state)) > > + clear_bdi_congested(bdi, BLK_RW_ASYNC); > > + if (waitqueue_active(&wqh[BLK_RW_ASYNC])) { > > + smp_mb__after_clear_bit(); > > + wake_up(&wqh[BLK_RW_ASYNC]); > > + } > > + } > > +} > > > > wakeup implies a full memory barrier. If so, this smp_mb__after_clear_bit() line is also not necessary? void clear_bdi_congested(struct backing_dev_info *bdi, int sync) { //... clear_bit(bit, &bdi->state); smp_mb__after_clear_bit(); if (waitqueue_active(wqh)) wake_up(wqh); Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html