On Tuesday January 4, ptb@xxxxxxxxxxxxxx wrote: > > Uh, that's not at issue. The question is whether it is CORRECT, not > whether it is consistent. > What exactly do you mean by "correct". If I have a program that writes some data: write(fd, buffer, 8192); and then makes sure the data is on disk: fsync(fd); but the computer crashes sometime between when the write call started and the fsync called ended, then I reboot and read back that block of data from disc, what is the "CORRECT" value that I should read back? The answer is, of course, that there is no one "correct" value. It would be correct to find the data that I had tried to write. It would also be correct to find the data that had been in the file before I started the write. If the size of the write is larger than the blocksize of the filesystem, it would also be correct to find a mixture of the old data and the new data. Exactly the same is true at every level of the storage stack. There is a point in time where a write request starts, and a point in time where the request is known to complete, and between those two times the content of the affected area of storage is undefined, and could have any of several (probably 2) "correct" values. After an unclean shutdown of a raid1 array, every (working) device has correct data on it. They may not all be the same, but they are all correct. md arbitrarily chooses one of these correct values, and replicates it across all drives. While it is replicating, all reads are served by the chosen drive. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html