Ken Preslan <kpreslan@xxxxxxxxxx> writes: ... > Suppose Node A writes inode 23 and Node B writes inode 24 (both at the > same time). The following sequence of events could occur: > > 1) Node A locks inode 23 exclusively > 2) Node B locks inode 24 exclusively > 3) Node A starts writing inode 23. This consists of: > A) Reading the inode off of Disk 0 > B) Reading the parity block off of Disk 2 > C) XORing the old version of the Disk 0 block out of the Disk 2 block > D) XORing the new version of the Disk 0 block into the Disk 2 block > 4) Node B starts writing inode 24. This consists of: > A) Reading the inode off of Disk 1 > B) Reading the parity block off of Disk 2 > C) XORing the old version of the Disk 1 block out of the Disk 2 block > D) XORing the new version of the Disk 1 block into the Disk 2 block > 5) Node A completes writing inode 23. This consists of: > A) Writing the new block to Disk 0 > A) Writing the new parity block to Disk 2 > 6) Node A completes writing inode 24. This consists of: That's node B if I am following you correctly. > A) Writing the new block to Disk 1 > A) Writing the new parity block to Disk 2 > > The problem is that you had two simultaneous read-modify-write operations > on the parity block. Neither operation took the other one into account. > So, the data in the non-parity blocks is correct, but the parity block is > now corrupt. As long as you don't lose a disk, you're fine. But, as soon > as a disk dies, the values you'll get from reading inode 23 and 24 will > be completely bogus. Thanks for the example. That's about as concrete as one could hope for! > A cluster aware software RAID5 implementation would lock stripes so that > only one machine could modify a stripe at a time. It sounds like it would be slow. Maybe not in a situation with reader-writer locks, where writes were infrequent, though. -- Ed L Cashin <ecashin@xxxxxxxxxx>