On Wed, Apr 01, 2009 at 07:53:20PM +0800, Jeff Layton wrote: > On Wed, 1 Apr 2009 14:56:18 +0800 > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > On Tue, Mar 31, 2009 at 06:07:30PM -0700, Andrew Morton wrote: > > > On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > On Tue, 31 Mar 2009 17:20:31 -0700 > > > > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > On Tue, 31 Mar 2009 20:03:59 -0400 > > > > > Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > + * It's not sufficient to just do a time_after() check on > > > > > > + * dirtied_when. That assumes that dirtied_when will always > > > > > > + * change within a period of jiffies that encompasses half the > > > > > > + * machine word size (2^31 jiffies on 32-bit arch). That's not > > > > > > + * necessarily the case if an inode is being constantly > > > > > > + * redirtied. Since dirtied_when can never be in the future, > > > > > > + * we can assume that if it appears to be so then it is > > > > > > + * actually in the distant past. > > > > > > > > > > so this really is a 32-bit-only thing. > > > > > > > > > > I guess that isn't worth optimising for though. > > > > > > > > > > > > > Yeah, it's pretty much impossible to hit this on a 64-bit machine. > > > > > > > > > otoh, given that all three comparisons are the same: > > > > > > > > > > + time_after(inode->dirtied_when, *older_than_this) && > > > > > + time_before_eq(inode->dirtied_when, jiffies)) > > > > > > > > > > (although one is inverted (i think?)), it might end up nicer if this was all done > > > > > in a little helper function? > > > > > > > > > > That way we only need to comment what's going on at a single site, and > > > > > we could omit the additional test if !CONFIG_64BIT. > > > > > > > > Ok, that seems reasonable. > > > > > > > > At one point I had a macro similar to time_in_range(), but dropped it > > > > primarily because time_after_but_before_eq() wasn't easy on the eyes. > > > > Thoughts on better names? > > > > > > I was thinking > > > > > > bool inode_dirtied_after(...); > > > > > > and just leave the innards using time_after() and time_before_eq()? > > > > Andrew, here is the updated patch. Note that the first chunk for > > redirty_tail() was not absolutely necessary and so removed. > > > > Thanks, > > Fengguang > > --- > > Subject: writeback: guard against jiffies wraparound on inode->dirtied_when checks > > From: Jeff Layton <jlayton@xxxxxxxxxx> > > > > The dirtied_when value on an inode is supposed to represent the first > > time that an inode has one of its pages dirtied. This value is in units > > of jiffies. It's used in several places in the writeback code to > > determine when to write out an inode. > > > > The problem is that these checks assume that dirtied_when is updated > > periodically. If an inode is continuously being used for I/O it can be > > persistently marked as dirty and will continue to age. Once the time > > difference between dirtied_when and the jiffies value it is being > > compared to is greater than or equal to half the maximum of the jiffies > > type, the logic of the time_*() macros inverts and the opposite of what > > is needed is returned. On 32-bit architectures that's just under 25 days > > (assuming HZ == 1000). > > > > As the least-recently dirtied inode, it'll end up being the first one > > that pdflush will try to write out. sync_sb_inodes() does this check: > > > > /* Was this inode dirtied after sync_sb_inodes was called? */ > > if (time_after(inode->dirtied_when, start)) > > break; > > > > ...but now dirtied_when appears to be in the future. sync_sb_inodes() > > bails out without attempting to write any dirty inodes. When this > > occurs, pdflush will stop writing out inodes for this superblock. > > Nothing can unwedge it until jiffies moves out of the problematic > > window. > > > > Fix this problem by changing the checks against dirtied_when to also > > check whether it appears to be in the future. If it does, then we > > consider the value to be far in the past. > > > > This should shrink the problematic window of time to such a small > > period(30s) as not to matter. > > > > Acked-by: Ian Kent <raven@xxxxxxxxxx> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > > --- > > fs/fs-writeback.c | 26 ++++++++++++++++++++++---- > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > --- mm.orig/fs/fs-writeback.c > > +++ mm/fs/fs-writeback.c > > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i > > struct inode *tail_inode; > > > > tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list); > > - if (!time_after_eq(inode->dirtied_when, > > + if (time_before(inode->dirtied_when, > > tail_inode->dirtied_when)) > > inode->dirtied_when = jiffies; > > } > > I think we need a similar change in this function in order to maintain > the list order. > > Consider this case: > > We have an s_dirty list with a head inode that appears to be in the > future. We start writeback and clear out s_dirty (all of the inodes are > moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty > list with a dirtied_when value that equals now. The inode with the > dirtied_when value that looks like it's in the future is redirtied while > being written and redirty_tail is called. It goes back on the list > without resetting dirtied_when even though it's actually older than the > inode at the tail. What's the difference? It _is_ the past because all 2 reference sites are now taught to think so. So s_dirty is still in order, and the writeback process won't be blocked. > There is another option too that I'll throw out here... > > We could just make dirtied_when a 64 bit value on 32 bit machines and > use jiffies_64 there. On the upside there is no "problematic > window" with that. The downside is that struct inode would grow by 4 > bytes on 32 bit arches, and checking jiffies_64 on such an arch is > more computationally intensive. We'd also have to change the size of > older_than_this value in the writeback_control struct too if we want to > go this route... Yes that could eliminate the 30s or more temporary writeback stillness. The only problem is the extra costs for normal cases, especially the space cost. Thanks, Fengguang > > > @@ -220,6 +220,21 @@ static void inode_sync_complete(struct i > > wake_up_bit(&inode->i_state, __I_SYNC); > > } > > > > +static bool inode_dirtied_after(struct inode *inode, unsigned long t) > > +{ > > + bool ret = time_after(inode->dirtied_when, t); > > +#ifndef CONFIG_64BIT > > + /* > > + * For inodes being constantly redirtied, dirtied_when can get stuck. > > + * It _appears_ to be in the future, but is actually in distant past. > > + * This test is necessary to prevent such wrapped-around relative times > > + * from permanently stopping the whole pdflush writeback. > > + */ > > + ret = ret && time_before_eq(inode->dirtied_when, jiffies); > > +#endif > > + return ret; > > +} > > + > > /* > > * Move expired dirty inodes from @delaying_queue to @dispatch_queue. > > */ > > @@ -231,7 +246,7 @@ static void move_expired_inodes(struct l > > struct inode *inode = list_entry(delaying_queue->prev, > > struct inode, i_list); > > if (older_than_this && > > - time_after(inode->dirtied_when, *older_than_this)) > > + inode_dirtied_after(inode, *older_than_this)) > > break; > > list_move(&inode->i_list, dispatch_queue); > > } > > @@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super > > continue; /* blockdev has wrong queue */ > > } > > > > - /* Was this inode dirtied after sync_sb_inodes was called? */ > > - if (time_after(inode->dirtied_when, start)) > > + /* > > + * Was this inode dirtied after sync_sb_inodes was called? > > + * This keeps sync from extra jobs and livelock. > > + */ > > + if (inode_dirtied_after(inode, start)) > > break; > > > > /* Is another pdflush already flushing this queue? */ > > > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html