Re: Intel fakeraid working?

NeilBrown <neilb@xxxxxxx> · Mon, 23 Apr 2012 12:25:20 +1000

On Tue, 10 Apr 2012 09:14:58 -0400 Phillip Susi <psusi@xxxxxxxxxx> wrote:

> On 4/9/2012 7:43 PM, NeilBrown wrote:
> > It will reject writes from user-space, and it will reject attempts to mount a
> > filesystem unless the filesystem is mounted "read-only".
> > But if a read-only mounted filesystem decides to write anyway (XFS, ext3,
> > ext4...) then the block layer doesn't stop it.
> 
> How does that work?  How does the block layer know or care what mount 
> flags are used?  My understanding is that setting the block layer 
> read-only flag with blockdev --setro actually causes the write bios to 
> be rejected, thus preventing ext[34] from playing back the journal. 
> Ubuntu has been using this to prevent accidental damage by read-only 
> mounts since this abhorrent behavior was discovered.  I would think that 
> md should work the same way.
> 

It's actually a long time since I looked at this in detail, so I've had
another hunt around the code to see what the current state really is.
I've only looked at ext3, not other filesystems, but it should be fairly
representative.

ext3 does appear to take care not to write anything if the device is marked
readonly - so maybe I've been doing it a dis-service there.  However there
have been bugs.  The most recent was fixed about 18 months ago:

commit 31d710a7bd42f0d89e30d53bdaad427c5f191d0d
Author: Maciej 305273enczykowski <zenczykowski@xxxxxxxxx>
Date:   Sun Sep 26 12:38:28 2010 +0000

    ext3: don't update sb journal_devnum when RO dev

    An ext3 filesystem on a read-only device, with an external journal
    which is at a different device number then recorded in the superblock
    will fail to honor the read-only setting of the device and trigger
    a superblock update (write).

    For example:
      - ext3 on a software raid which is in read-only mode
      - external journal on a read-write device which has changed device num
      - attempt to mount with -o journal_dev=<new_number>
      - hits BUG_ON(mddev->ro = 1) in md.c

    Cc: Theodore Ts'o <tytso@xxxxxxx>
    Signed-off-by: Maciej 305273enczykowski <zenczykowski@xxxxxxxxx>
    Signed-off-by: Jan Kara <jack@xxxxxxx>

This fix was in 2.6.38.  I don't know if was backported at all.

The read-only flag does not cause the block-layer to do, or not do, anything.
It is merely information that is made available to the filesystem (or the
"block_dev" pseudo-filesystem that presents /dev/sda or whatever and maps
read/write requests directly to bios which get sent down).

It is up the the filesystem to honour the read-only setting.  Maybe this is
poor design: I haven't thought about it much.  But that is the way it is.

When ext3 sees that it needs to replay the journal it will check if the
device is read-only. If it is, it will failed with EROFS.  So you cannot
mount an inconsistent filesystem from a read-only device.

md does set the same flag that "blockdev --setro" sets.  If some filesystem
writes anyway, then it is definitely a bug and should be fixed.

So it looks like everybody is doing the "right" thing (allowing for
occasional bugs here and there), and your real problem is just that Ubuntu
didn't package mdmon.

NeilBrown
Attachment:
signature.asc

Description: PGP signature