On Wed, 1 Apr 2009 21:07:37 +0800 Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > On Wed, Apr 01, 2009 at 08:48:43PM +0800, Jeff Layton wrote: > > > > > --- mm.orig/fs/fs-writeback.c > > > > > +++ mm/fs/fs-writeback.c > > > > > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i > > > > > struct inode *tail_inode; > > > > > > > > > > tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list); > > > > > - if (!time_after_eq(inode->dirtied_when, > > > > > + if (time_before(inode->dirtied_when, > > > > > tail_inode->dirtied_when)) > > > > > inode->dirtied_when = jiffies; > > > > > } > > > > > > > > I think we need a similar change in this function in order to maintain > > > > the list order. > > > > > > > > Consider this case: > > > > > > > > We have an s_dirty list with a head inode that appears to be in the > > > > future. We start writeback and clear out s_dirty (all of the inodes are > > > > moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty > > > > list with a dirtied_when value that equals now. The inode with the > > > > dirtied_when value that looks like it's in the future is redirtied while > > > > being written and redirty_tail is called. It goes back on the list > > > > without resetting dirtied_when even though it's actually older than the > > > > inode at the tail. > > > > > > What's the difference? It _is_ the past because all 2 reference sites > > > are now taught to think so. > > > > > > So s_dirty is still in order, and the writeback process won't be blocked. > > > > > > > Sanity check -- my understanding is this: > > > > head == least-recently dirtied inode > > tail == most-recently dirtied inode > > > > ...if so, then we are violating the list order if we don't make a > > change to redirty_tail. We're putting an inode that's far in the past > > back onto the tail of the list without resetting dirtied_when. A more > > recently-dirtied inode will precede one that was dirtied less recently. > > > > Since the newly dirtied inode is closer to the head of the list, the > > older inode that's constantly being redirtied won't be written out > > until the newly dirtied one passes the older_than_this check (30s or > > so in the usual case). > > If you call that out-of-order, yes it is. Sadly it cannot be improved > by playing with dirtied_when: the _physical_ order is still the same. > > You know what? That's exactly the drawback of redirtying into s_dirty. > It's irrelevant to the resetting of dirtied_when. A new s_more_io_wait > is the only way to solve this problem. > Agreed. The consequences are also the same regardless of whether we update dirtied when -- a 30s delay in writeback when we redirty back onto s_dirty. Ok, I'm convinced. Ack on that patch since the behavior is the same regardless of whether we update dirtied_when. > > > > There is another option too that I'll throw out here... > > > > > > > > We could just make dirtied_when a 64 bit value on 32 bit machines and > > > > use jiffies_64 there. On the upside there is no "problematic > > > > window" with that. The downside is that struct inode would grow by 4 > > > > bytes on 32 bit arches, and checking jiffies_64 on such an arch is > > > > more computationally intensive. We'd also have to change the size of > > > > older_than_this value in the writeback_control struct too if we want to > > > > go this route... > > > > > > Yes that could eliminate the 30s or more temporary writeback stillness. > > > The only problem is the extra costs for normal cases, especially the > > > space cost. > > > > > > > Correct. I'm not necessarily advocating that approach but it's one to > > consider... > > > > If your s_more_io_wait patchset comes to fruition though then that > > change really won't be needed, so maybe it's best not to go that route. > > Sorry for the delay. But I'm still curious about the redirty > process&timing of NFS/XFS. The conflicts with Jens' per-bdi pdflush > patches are another concern... > No worries. Getting this correct is the most important thing. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html