On Fri 18-06-10 12:21:36, Peter Zijlstra wrote: > On Thu, 2010-06-17 at 20:04 +0200, Jan Kara wrote: > > +/* Wait until write_chunk is written or we get below dirty limits */ > > +void bdi_wait_written(struct backing_dev_info *bdi, long write_chunk) > > +{ > > + struct bdi_written_count wc = { > > + .list = LIST_HEAD_INIT(wc.list), > > + .written = write_chunk, > > + }; > > + DECLARE_WAITQUEUE(wait, current); > > + int pause = 1; > > + > > + bdi_add_writer(bdi, &wc, &wait); > > + for (;;) { > > + if (signal_pending_state(TASK_KILLABLE, current)) > > + break; > > + > > + /* > > + * Make the task just killable so that tasks cannot circumvent > > + * throttling by sending themselves non-fatal signals... > > + */ > > + __set_current_state(TASK_KILLABLE); > > + io_schedule_timeout(pause); > > + > > + /* > > + * The following check is save without wb_written_wait.lock > > + * because once bdi_remove_writer removes us from the list > > + * noone will touch us and it's impossible for list_empty check > > + * to trigger as false positive. The barrier is there to avoid > > + * missing the wakeup when we are removed from the list. > > + */ > > + smp_rmb(); > > + if (list_empty(&wc.list)) > > + break; > > + > > + if (!dirty_limits_exceeded(bdi)) > > + break; > > + > > + /* > > + * Increase the delay for each loop, up to our previous > > + * default of taking a 100ms nap. > > + */ > > + pause <<= 1; > > + if (pause > HZ / 10) > > + pause = HZ / 10; > > + } > > + > > + spin_lock_irq(&bdi->wb_written_wait.lock); > > + __remove_wait_queue(&bdi->wb_written_wait, &wait); > > + if (!list_empty(&wc.list)) > > + bdi_remove_writer(bdi, &wc); > > + spin_unlock_irq(&bdi->wb_written_wait.lock); > > +} > > OK, so the whole pause thing is simply because we don't get a wakeup > when we drop below the limit, right? Yes. I will write a comment about it before the loop. I was also thinking about sending a wakeup when we get below limits but then all threads would start thundering the device at the same time and likely cause a congestion again. This way we might get a smoother start. But I'll have to measure whether we aren't too unfair with this approach... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html