Re: [PATCH v2 5/7] fs: notify superblocks of backing-device death

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 1 Dec 2015 15:03:36 +1100

On Wed, Nov 25, 2015 at 11:11:00PM -0800, Dan Williams wrote:
> On Wed, Nov 25, 2015 at 10:27 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Wed, Nov 25, 2015 at 02:09:34PM -0800, Dan Williams wrote:
> >> On Wed, Nov 25, 2015 at 1:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> > On Wed, Nov 25, 2015 at 10:37:24AM -0800, Dan Williams wrote:
> [..]
> >> That sounds good in theory.  However, in the case of xfs it is already
> >> calling xfs_force_shutdown(),
> >
> > Where?
> 
> In the trace I included in the cover letter.
> 
> Testing this patch set reveals that xfs needs more XFS_FORCED_SHUTDOWN
> checks, especially in the unmount path.  Currently we deadlock here on
> umount after block device removal:

What's the test case?

>From what you've said, it appears to be another manifestation where
we trying to recovery from non-fatal IO errors that are being
reported by the block device, but that is short-cutting the path
that normally does shutdown detection on metadata buffer IO
completion.

Filesystem error handling coul dbe a lot more simple if we didnt'
have to try to guess what EIO from the block device actually means.
If device unplug triggered a shutdown, we wouldn't have to care
about retrying in many cases where we do right now because shutdown
then has clear priority of all other errors we need to handle. RIgh
tnow we just have ot guess, and there's a long history of corner
cases where we have guessed wrong....

I have patches to fix this particular manifestation, but until we
have have notification that devices hav ebeen unplugged and are not
coming back we're going to continue to struggle to get this right
and hence have re-occurring bugs when users pull USB drives out from
under active filesystems.....

> >> None of this fixes the problem of dax mappings leaking past block
> >> device remove.  That can be done generically without needing per-fs
> >> code.
> >
> > No, the shutdown is intended to close the race between the device
> > being removed, the mappings being invalidated and the filesytem
> > racing creating new mappings during page faults because it doesn't
> > know the device has been unplugged until it does IO that gets some
> > error in an unrecoverable path...
> 
> So you want me to grow a ->sb_shutdown() method for each fs that
> supports dax only to call unmap_mapping_range on each dax inode when
> common code could have already walked the inode list.

No, I don't want you to implement some whacky, dax only
->sb_shutdown method. I want a notification method to be implemented
so that each filesystem can take the necessary action it requires
when the underlying block device goes away.

Yes, that *filesystem specific method* might involve invalidating
all the dirty inodes in the filesystem, and if there are DAX
mappings in the filesystem then invalidating them, too, but that's
something the filesystem needs to take care of because *all
filesystem shutdowns must do this*.

Do you see the distinction there? This isn't just about device
unplug - there are lots of different triggers for a filesystem
shutdown. However, regardless of the shutdown trigger, we have to
take the same action. That is, we have to prevent all pending and
future IO from being issued to the block device, *even if the block
device is still OK*.

For example, if we detect a free space corruption during allocation,
it is not safe to trust *any active mapping* because we can't trust
that we having handed out the same block to multiple owners. Hence
on such a filesystem shutdown, we have to prevent any new DAX
mapping from occurring and invalidate all existing mappings as we
cannot allow userspace to modify any data or metadata until we've
resolved the corruption situation.

The simple fact is that a /filesystem/ shutdown needs to do DAX
mapping invalidation regardless of whether the block device has been
unplugged or not. This is not a case of "this only happens when we
unplug the device", this is a user data protection mechanism that we
use to prevent corruption propagation once it has been detected. A
device unplug is just one type of "corruption" that can occur.

Hence putting all this special invalidation stuff into the block
devices is simply duplicating functionality we need to implement in
the filesystem layers.  Regardless of this, it should be done by the
filesystem layers because that is where all the information necesary
to determine what needs invalidation is stored.

So, from the block dvice perspective, the *only thing it needs to
do* is notify the filesystem that it needs to shut down, and the
filesystem should then take care of everything else. Device unplug
is not some special snowflake that needs to be treated differently
from all other types of "it's dead, Jim" filesystem errors - from
the FS perspective it's the dead simple case because there are no
serious consequences if we screw up. i.e. if the FS gets it wrong
and IO is still issued after the shutdown, the IO is going to be
errored out rather than corrupting something on disk that would have
otherwise been OK....

> Also separately
> teach each fs to start returning errors on get_blocks() after shutdown
> even though fs/dax.c can figure it out from the return value from
> blk_queue_enter?

They will already return errors from get_blocks.

STATIC int
__xfs_get_blocks(
        struct inode            *inode,
        sector_t                iblock,
        struct buffer_head      *bh_result,
        int                     create,
        bool                    direct,
        bool                    dax_fault)
{
        struct xfs_inode        *ip = XFS_I(inode);
        struct xfs_mount        *mp = ip->i_mount;
        xfs_fileoff_t           offset_fsb, end_fsb;
        int                     error = 0;
        int                     lockmode = 0;
        struct xfs_bmbt_irec    imap;
        int                     nimaps = 1;
        xfs_off_t               offset;
        ssize_t                 size;
        int                     new = 0;

        if (XFS_FORCED_SHUTDOWN(mp))
                return -EIO;
.....

Hmmm? First thing XFS does in get_blocks() is check for shutdown.
In fact, the first thing that every major subsystem entry point
does in XFs is check for shutdown. Let's look at a typical
allocation get_blocks call in XFS:

 xfs_get_blocks
   __xfs_get_blocks - has shutdown check
    xfs_bmapi_read - shutdown check is second after error injection
    xfs_iomap_write_direct
      xfs_trans_reserve - has shutdown check
      xfs_bmapi_write - has shutdown check
      xfs_trans_commit - has shutdown check

IOWs, there are shutdown checks all through the get_blocks callbacks
already, so it's hardly a caase of you having to go an add support
to any filesystem for this...

Cheers,
MDave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html