Hi Jens, now I've found just two minor things (see below). Besides them the only thing which is blocking my ack is a way to effectively lookup a BDI from a superblock so that we can reasonably effectively fsync a superblock... Honza On Fri 04-09-09 14:04:07, Jens Axboe wrote: > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 45ad4bb..c86492c 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > +static long wb_check_old_data_flush(struct bdi_writeback *wb) > +{ > + unsigned long expired; > + long nr_pages; > + > + expired = wb->last_old_flush + > + msecs_to_jiffies(dirty_writeback_interval * 10); > + if (time_before(jiffies, expired)) > + return 0; > + > + nr_pages = global_page_state(NR_FILE_DIRTY) + > + global_page_state(NR_UNSTABLE_NFS) + > + (inodes_stat.nr_inodes - inodes_stat.nr_unused); > + > + return wb_writeback(wb, nr_pages, NULL, WB_SYNC_NONE, 1); > +} > + > +/* > + * Retrieve work items and do the writeback they describe > + */ > +long wb_do_writeback(struct bdi_writeback *wb, int force_wait) > +{ > + struct backing_dev_info *bdi = wb->bdi; > + struct bdi_work *work; > + long nr_pages, wrote = 0; > + > + while ((work = get_next_work_item(bdi, wb)) != NULL) { > + enum writeback_sync_modes sync_mode; > + > + nr_pages = work->nr_pages; > + > + /* > + * Override sync mode, in case we must wait for completion > + */ > + if (force_wait) > + work->sync_mode = sync_mode = WB_SYNC_ALL; > + else > + sync_mode = work->sync_mode; > + > + /* > + * If this isn't a data integrity operation, just notify > + * that we have seen this work and we are now starting it. > + */ > + if (sync_mode == WB_SYNC_NONE) > + wb_clear_pending(wb, work); > + > + wrote += wb_writeback(wb, nr_pages, work->sb, sync_mode, 0); > + > + /* > + * This is a data integrity writeback, so only do the > + * notification when we have completed the work. > + */ > + if (sync_mode == WB_SYNC_ALL) > + wb_clear_pending(wb, work); > + } > + > + /* > + * Check for periodic writeback, kupdated() style > + */ > + if (!wrote) > + wrote = wb_check_old_data_flush(wb); Why is here the !wrote check? It would feel safer if we just did wrote += wb_check_old_data_flush(wb); Otherwise we cannot guarantee syncing of inodes every writeback_interval. > +/* > + * Schedule writeback for all backing devices. Expensive! If this is a data > + * integrity operation, writeback will be complete when this returns. If > + * we are simply called for WB_SYNC_NONE, then writeback will merely be > + * scheduled to run. > + */ > +static void bdi_writeback_all(struct writeback_control *wbc) > +{ > + const bool must_wait = wbc->sync_mode == WB_SYNC_ALL; > + struct backing_dev_info *bdi; > + struct bdi_work *work; > + LIST_HEAD(list); > + > +restart: > + spin_lock(&bdi_lock); > > - filemap_fdatawait(mapping); > + list_for_each_entry(bdi, &bdi_list, bdi_list) { > + struct bdi_work *work; > + > + if (!bdi_has_dirty_io(bdi)) > + continue; > + > + /* > + * If work allocation fails, do the writes inline. We drop > + * the lock and restart the list writeout. This should be OK, > + * since this happens rarely and because the writeout should > + * eventually make more free memory available. > + */ > + work = bdi_alloc_work(wbc); > + if (!work) { > + struct writeback_control __wbc = *wbc; > > - cond_resched(); > + /* > + * Not a data integrity writeout, just continue > + */ > + if (!must_wait) > + continue; > > - spin_lock(&inode_lock); > + spin_unlock(&bdi_lock); > + __wbc = *wbc; You initialize the variable twice... -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html