Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun 18-09-11 22:07:37, Wu Fengguang wrote:
> On Thu, Sep 08, 2011 at 11:03:40PM +0800, Jan Kara wrote:
> > On Thu 08-09-11 09:22:36, Wu Fengguang wrote:
> > > Jan,
> > > 
> > > > @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> > > >  	/* Don't write the inode if only I_DIRTY_PAGES was set */
> > > >  	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
> > > >  		int err = write_inode(inode, wbc);
> > > > +		if (!err)
> > > > +			inode_written = true;
> > > >  		if (ret == 0)
> > > >  			ret = err;
> > > >  	}
> > > 
> > > write_inode() typically return error after redirtying the inode.
> > > So the conditions inode_written=false and (inode->i_state & I_DIRTY)
> > > are mostly on/off together. For the cases they disagree, it's probably
> > > a filesystem bug -- at least I don't think some FS will deliberately 
> > > return success while redirtying the inode, or the reverse.
> >   There is a possibility someone else redirties the inode between the moment
> > I_DIRTY bits are cleared in writeback_single_inode() and the check for
> > I_DIRTY is done after ->write_inode() is called. Especially when
> > write_inode() blocks waiting for some IO this isn't that hard to happen. So
> > there are valid (although relatively rare) cases when inode_written is
> > different from the result of I_DIRTY check.
> 
> Ah yes, that's good point.
> 
> > > >  		} else if (inode->i_state & I_DIRTY) {
> > > >  			/*
> > > >  			 * Filesystems can dirty the inode during writeback
> > > >  			 * operations, such as delayed allocation during
> > > >  			 * submission or metadata updates after data IO
> > > > -			 * completion.
> > > > +			 * completion. Also inode could have been dirtied by
> > > > +			 * some process aggressively touching metadata.
> > > > +			 * Finally, filesystem could just fail to write the
> > > > +			 * inode for some reason. We have to distinguish the
> > > > +			 * last case from the previous ones - in the last case
> > > > +			 * we want to give the inode quick retry, in the
> > > > +			 * other cases we want to put it back to the dirty list
> > > > +			 * to avoid livelocking of writeback.
> > > >  			 */
> > > > -			redirty_tail(inode, wb);
> > > > +			if (inode_written)
> > > > +				redirty_tail(inode, wb);
> > > 
> > > Can you elaborate the livelock in the below inode_written=true case?
> > > Why the sleep in the wb_writeback() loop is not enough?
> >   In case someone would be able to consistently trigger the race window and
> > redirty the inode before we check here, we would loop for a long time
> > always writing just this inode and thus effectivelly stalling other
> > writeback. That's why I push redirtied inode behind other inodes in the
> > dirty list.
> 
> Agreed. All the left to do is to confirm whether this addresses
> Christoph's original problem.
> 
> Acked-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
  Great, thanks for review! I'll resend the two patches to Christoph so
that he can try them.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux