Re: [PATCH 28/30] xfs: rework xfs_iflush_cluster() dirty inode iteration

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 11 Jun 2020 09:40:08 +1000

On Wed, Jun 10, 2020 at 09:06:28AM -0400, Brian Foster wrote:
> On Wed, Jun 10, 2020 at 08:01:39AM +1000, Dave Chinner wrote:
> > On Tue, Jun 09, 2020 at 09:11:55AM -0400, Brian Foster wrote:
> > > On Thu, Jun 04, 2020 at 05:46:04PM +1000, Dave Chinner wrote:
> > > > -		 * check is not sufficient.
> > > > +		 * If we are shut down, unpin and abort the inode now as there
> > > > +		 * is no point in flushing it to the buffer just to get an IO
> > > > +		 * completion to abort the buffer and remove it from the AIL.
> > > >  		 */
> > > > -		if (!cip->i_ino) {
> > > > -			xfs_ifunlock(cip);
> > > > -			xfs_iunlock(cip, XFS_ILOCK_SHARED);
> > > > +		if (XFS_FORCED_SHUTDOWN(mp)) {
> > > > +			xfs_iunpin_wait(ip);
> > > 
> > > Note that we have an unlocked check above that skips pinned inodes.
> > 
> > Right, but we could be racing with a transaction commit that pinned
> > the inode and a shutdown. As the comment says: it's a quick and
> > dirty check to avoid trying to get locks when we know that it is
> > unlikely we can flush the inode without blocking. We still have to
> > recheck that state once we have the ILOCK....
> > 
> 
> Right, but that means we can just as easily skip the shutdown processing
> (which waits for unpin) if a particular inode is pinned. So which is
> supposed to happen in the shutdown case?
>
> ISTM that either could happen. As a result this kind of looks like
> random logic to me.

Yes, shutdown is racy, so it could be either. However, I'm not
changing the shutdown logic or handling here. If the shutdown race
could happen before this patchset (and it can), it can still happen
after the patchset, and this patchset does not change the way we
handle the shutdown race at all.

IOWs, while this shutdown logic may appear "random", that's not a
result of this patchset - it a result of design decisions made in
the last major shutdown rework/cleanup that required checks to be
added to places that could hang waiting for an event that would
never occur because shutdown state prevented it from occurring.

There's already enough complexity in this patchset that adding
shutdown logic changes is just too much to ask for.  If we want to
change how various shutdown logics work, lets do it as a separate
set of changes so all the subtle bugs that result from the changes
bisect to the isolated shutdown logic changes...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx