Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread

Jan Kara <jack@xxxxxxx> · Mon, 21 Jun 2010 16:02:37 +0200



On Fri 18-06-10 12:21:36, Peter Zijlstra wrote:
> On Thu, 2010-06-17 at 20:04 +0200, Jan Kara wrote:
> > +/* Wait until write_chunk is written or we get below dirty limits */
> > +void bdi_wait_written(struct backing_dev_info *bdi, long write_chunk)
> > +{
> > +       struct bdi_written_count wc = {
> > +                                       .list = LIST_HEAD_INIT(wc.list),
> > +                                       .written = write_chunk,
> > +                               };
> > +       DECLARE_WAITQUEUE(wait, current);
> > +       int pause = 1;
> > +
> > +       bdi_add_writer(bdi, &wc, &wait);
> > +       for (;;) {
> > +               if (signal_pending_state(TASK_KILLABLE, current))
> > +                       break;
> > +
> > +               /*
> > +                * Make the task just killable so that tasks cannot circumvent
> > +                * throttling by sending themselves non-fatal signals...
> > +                */
> > +               __set_current_state(TASK_KILLABLE);
> > +               io_schedule_timeout(pause);
> > +
> > +               /*
> > +                * The following check is save without wb_written_wait.lock
> > +                * because once bdi_remove_writer removes us from the list
> > +                * noone will touch us and it's impossible for list_empty check
> > +                * to trigger as false positive. The barrier is there to avoid
> > +                * missing the wakeup when we are removed from the list.
> > +                */
> > +               smp_rmb();
> > +               if (list_empty(&wc.list))
> > +                       break;
> > +
> > +               if (!dirty_limits_exceeded(bdi))
> > +                       break;
> > +
> > +               /*
> > +                * Increase the delay for each loop, up to our previous
> > +                * default of taking a 100ms nap.
> > +                */
> > +               pause <<= 1;
> > +               if (pause > HZ / 10)
> > +                       pause = HZ / 10;
> > +       }
> > +
> > +       spin_lock_irq(&bdi->wb_written_wait.lock);
> > +       __remove_wait_queue(&bdi->wb_written_wait, &wait);
> > +       if (!list_empty(&wc.list))
> > +               bdi_remove_writer(bdi, &wc);
> > +       spin_unlock_irq(&bdi->wb_written_wait.lock);
> > +} 
> 
> OK, so the whole pause thing is simply because we don't get a wakeup
> when we drop below the limit, right?
  Yes. I will write a comment about it before the loop. I was also thinking
about sending a wakeup when we get below limits but then all threads would
start thundering the device at the same time and likely cause a congestion
again. This way we might get a smoother start. But I'll have to measure
whether we aren't too unfair with this approach...

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html