[Linux-cluster] Re: GFS on md on shared disks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ken Preslan <kpreslan@xxxxxxxxxx> writes:

...
> Suppose Node A writes inode 23 and Node B writes inode 24 (both at the
> same time).  The following sequence of events could occur:
>
> 1)  Node A locks inode 23 exclusively
> 2)  Node B locks inode 24 exclusively
> 3)  Node A starts writing inode 23.  This consists of:
>     A) Reading the inode off of Disk 0
>     B) Reading the parity block off of Disk 2
>     C) XORing the old version of the Disk 0 block out of the Disk 2 block
>     D) XORing the new version of the Disk 0 block into the Disk 2 block
> 4)  Node B starts writing inode 24.  This consists of:
>     A) Reading the inode off of Disk 1
>     B) Reading the parity block off of Disk 2
>     C) XORing the old version of the Disk 1 block out of the Disk 2 block
>     D) XORing the new version of the Disk 1 block into the Disk 2 block
> 5)  Node A completes writing inode 23.  This consists of:
>     A) Writing the new block to Disk 0
>     A) Writing the new parity block to Disk 2
> 6)  Node A completes writing inode 24.  This consists of:

That's node B if I am following you correctly.  

>     A) Writing the new block to Disk 1 
>     A) Writing the new parity block to Disk 2
>
> The problem is that you had two simultaneous read-modify-write operations
> on the parity block.  Neither operation took the other one into account.
> So, the data in the non-parity blocks is correct, but the parity block is
> now corrupt.  As long as you don't lose a disk, you're fine.  But, as soon
> as a disk dies, the values you'll get from reading inode 23 and 24 will
> be completely bogus.

Thanks for the example.  That's about as concrete as one could hope for!

> A cluster aware software RAID5 implementation would lock stripes so that
> only one machine could modify a stripe at a time.

It sounds like it would be slow.  Maybe not in a situation with
reader-writer locks, where writes were infrequent, though.

-- 
  Ed L Cashin <ecashin@xxxxxxxxxx>


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux