On Thu 27-11-14 18:00:16, Ted Tso wrote: > On Thu, Nov 27, 2014 at 02:14:21PM +0100, Jan Kara wrote: > > * change queue_io() to also call > > moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io, time + 24hours) > > For this you need to tweak move_expired_inodes() to take pointer to > > timestamp instead of pointer to work but that's trivial. Also you want > > probably leave time ->older_than_this value (i.e. without +24 hours) if > > we are doing WB_SYNC_ALL writeback. With this you can remove > > flush_sb_dirty_time() completely. > > Well.... it's not quite enough. The problem is that for ext3 and > ext4, the actual work of writing the inode happens in dirty_inode(), > not in write_inode(). Which means we need to do something like this. Right, I didn't realize this problem. > I'm not entirely sure whether or not this is too ugly to live; > personally, I think my hack of handling this in update_time() might be > preferable.... Actually handling the copying of timestamps in __writeback_single_inode() would look fine to me. You mention in your next email, calling mark_inode_dirty_sync() from flusher may be problematic - why? How is this any different from calling mark_inode_dirty_sync() from flush_sb_dirty_time()? I will note that for a while I thought copying the full inode to on-disk buffer may be problematic because inode may be in an intermediate state of some transactional change. But that isn't an issue - if there's any transactional change in progress, it has a handle open and until the change is node, thus the buffer with the partial change cannot go to the journal (transaction cannot commit) until mark_inode_dirty_sync() copies the final state of the inode. Another solution may be to convey the information that copying of timestamps is necessary to ->write_inode method. We could do that via a flag bit in writeback_control. Each filesystem can then copy timestamps when this bit is set. But calling mark_inode_dirty_sync() from __writeback_single_inode() looks simpler to me. Honza > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index b93c529..95a42b3 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -253,7 +253,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t) > */ > static int move_expired_inodes(struct list_head *delaying_queue, > struct list_head *dispatch_queue, > - struct wb_writeback_work *work) > + unsigned long *older_than_this) > { > LIST_HEAD(tmp); > struct list_head *pos, *node; > @@ -264,8 +264,8 @@ static int move_expired_inodes(struct list_head *delaying_queue, > > while (!list_empty(delaying_queue)) { > inode = wb_inode(delaying_queue->prev); > - if (work->older_than_this && > - inode_dirtied_after(inode, *work->older_than_this)) > + if (older_than_this && > + inode_dirtied_after(inode, *older_than_this)) > break; > list_move(&inode->i_wb_list, &tmp); > moved++; > @@ -309,9 +309,14 @@ out: > static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work) > { > int moved; > + unsigned long one_day_later = jiffies + (HZ * 86400); > + > assert_spin_locked(&wb->list_lock); > list_splice_init(&wb->b_more_io, &wb->b_io); > - moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, work); > + moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, > + work->older_than_this); > + moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io, > + &one_day_later); > trace_writeback_queue_io(wb, work, moved); > } > > @@ -637,6 +642,17 @@ static long writeback_sb_inodes(struct super_block *sb, > } > > /* > + * If the inode is marked dirty time but is not dirty, > + * then at last for ext3 and ext4 we need to call > + * mark_inode_dirty_sync in order to get the inode > + * timestamp transferred to the on disk inode, since > + * write_inode is a no-op for those file systems. HACK HACK HACK > + */ > + if ((inode->i_state & I_DIRTY_TIME) && > + ((inode->i_state & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) == 0)) > + mark_inode_dirty_sync(inode); > + > + /* > * Don't bother with new inodes or inodes being freed, first > * kind does not need periodic writeout yet, and for the latter > * kind writeout is handled by the freer. > @@ -1233,9 +1249,10 @@ void inode_requeue_dirtytime(struct inode *inode) > spin_lock(&bdi->wb.list_lock); > spin_lock(&inode->i_lock); > if ((inode->i_state & I_DIRTY_WB) == 0) { > - if (inode->i_state & I_DIRTY_TIME) > + if (inode->i_state & I_DIRTY_TIME) { > + inode->dirtied_when = jiffies; > list_move(&inode->i_wb_list, &bdi->wb.b_dirty_time); > - else > + } else > list_del_init(&inode->i_wb_list); > } > spin_unlock(&inode->i_lock); > > Comments? > > - Ted -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html