Re: ordered I/O with multipath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Bryan Henderson wrote:
> > > If the RAID code is changed to handle barriers, that would still 
have
> > > possible "scattershot" corruption on RAID-5, because writing a 
single
> > > sector on the logical device affects more than one visible sector if
> > > it is interrupted.  In other words, the "radius of corruption" is
> > > bigger than one sector for RAID-5, and it's not contiguous either.
> > 
> > I've seen several RAID-5 systems, and they all went to great lengths 
to 
> > ensure that interrupting a write to Sector A can't destroy Sector B. 
It 
> > isn't easy; it involves journalling.  But I've always taken it as an 
> > absolute requirement.
> 
> How do you do a second layer of journalling (in addition to the
> filesystem's) without a big performance penalty for the extra seeks?

The systems I know all have a means of storing data persistent across the 
kinds of restarts in question without seeking.  It's probably the only way 
to get great performance with data integrity.

But some things about Linux block device RAID-5 are coming back to me.  In 
the early implementations, if the system restarted without explicitly 
shutting down the array (as in a power failure), all of the parity in the 
array would be rebuilt.  Later, a "write intent bitmap" was added so it 
could rebuild substantially less than all of the parity.  That bitmap is 
the journal I was talking about, and I don't know what if anything it does 
to avoid a big performance penalty.

> But an failed write might corrupt previously
> hardened sectors in these cases:
> 
>     - Disks with 4k sectors pretending to be 512 byte sectors.

AFAIK there are no such disks today and there is a big controversy over 
whether it's acceptable for such disks currently being designed to allow 
such corruption.

>     - RAIDs without journalling (or other equivalent) and no
>       battery backup.

I still don't know if anybody is doing that.

>     - SSDs and other flash storage if their internal algorithms are 
stupid.

I don't know if that's commonly accepted either.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Storage Systems

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux