Re: [PATCH 1/1] xfs: fallback to readonly during recovery

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 12 Feb 2020 07:04:30 +1100

On Tue, Feb 11, 2020 at 08:04:01AM -0600, Vincent Fazio wrote:
> All,
> 
> On 2/11/20 6:55 AM, Brian Foster wrote:
> > On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
> > > On 2/10/20 4:31 PM, Aaron Sierra wrote:
> > > > > From: "Eric Sandeen" <sandeen@xxxxxxxxxxx>
> > > > > Sent: Monday, February 10, 2020 3:43:50 PM
> > > > > On 2/10/20 3:10 PM, Vincent Fazio wrote:
> > > > > > Previously, XFS would fail to mount if there was an error during log
> > > > > > recovery. This can occur as a result of inevitable I/O errors when
> > > > > > trying to apply the log on read-only ATA devices since the ATA layer
> > > > > > does not support reporting a device as read-only.
> > > > > > 
> > > > > > Now, if there's an error during log recovery, fall back to norecovery
> > > > > > mode and mark the filesystem as read-only in the XFS and VFS layers.
> > > > > > 
> > > > > > This roughly approximates the 'errors=remount-ro' mount option in ext4
> > > > > > but is implicit and the scope only covers errors during log recovery.
> > > > > > Since XFS is the default filesystem for some distributions, this change
> > > > > > allows users to continue to use XFS on these read-only ATA devices.
> > > > > What is the workload or scenario where you need this behavior?
> > > > > 
> > > > > I'm not a big fan of ~silently mounting a filesystem with latent errors,
> > > > > tbh, but maybe you can explain a bit more about the problem you're solving
> > > > > here?
> > > > Hi Eric,
> > > > 
> > > > We use SSDs from multiple vendors that can be configured at power-on (via
> > > > GPIO) to be read-write or write-protected. When write-protected we get I/O
> > > > errors for any writes that reach the device. We believe that behavior is
> > > > correct.
> > > > 
> > > > We have found that XFS fails during log recovery even when the log is clean
> > > > (apparently due to metadata writes immediately before actual recovery).
> > > There should be no log recovery if it's clean ...
> > > 
> > > And I don't see that here - a clean log on a readonly device simply mounts
> > > RO for me by default, with no muss, no fuss.
> > > 
> > > # mkfs.xfs -f fsfile
> > > ...
> > > # losetup /dev/loop0 fsfile
> > > # mount /dev/loop0 mnt
> > > # touch mnt/blah
> > > # umount mnt
> > > # blockdev --setro /dev/loop0
> > > # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
> > > dd: error writing ‘/dev/loop0’: Operation not permitted
> > > # mount /dev/loop0 mnt
> > > mount: /dev/loop0 is write-protected, mounting read-only
> > > # dmesg
> > > [  419.941649] /dev/loop0: Can't open blockdev
> > > [  419.947106] XFS (loop0): Mounting V5 Filesystem
> > > [  419.952895] XFS (loop0): Ending clean mount
> > > # uname -r
> > > 5.5.0
> > > 
> I think it's important to note that you're calling `blockdev --setro` here,
> which sets the device RO at the block layer...
> 
> As mentioned in the commit message, the SSDs we work with are ATA devices
> and there is no such mechanism in the ATA spec to report to the block layer
> that the device is RO. What we run into is this:

This sounds like you are trying to solve the wrong problem - this
isn't actually a filesystem issue. The fundamental problem is you
have a read-only device that isn't being marked by the kernel as
read-only, and everything goes wrong after that.

Write a udev rule to catch these SSDs at instantation time and mark
them read only via software. That way everything understands the
device is read only and behaves correctly, rather than need to make
every layer above the block device understand that a read-write
device is actually read-only...

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx