> -----Original Message----- > From: Neil Brown [mailto:neilb@xxxxxxx] > Sent: Sunday, August 08, 2010 4:56 AM > To: st0ff@xxxxxx > Cc: stefan.huebner@xxxxxxxxxxxxxxxxxx; Foster, Brian; linux- > raid@xxxxxxxxxxxxxxx > Subject: Re: --assume-clean on raid5/6 > > On Sat, 07 Aug 2010 14:28:55 +0200 > Stefan /*St0fF*/ Hübner <stefan.huebner@xxxxxxxxxxxxxxxxxx> wrote: > > > Hi Brian, > > > > --assume-clean skips over the initial resync. Which - if you will > > create a filesystem after creating the array - is a time-saving idea. > > But keep in mind: even if the disks are brand new and contain only > > zeros, the parity would probably look not all zeros. So reading from > > such an array would be a bad idea. > > But if the next thing you do is create LVM/filesystem etc., then all > bit > > read from the array will have been written to before (and by that are > in > > sync). > > There is an important point that this misses. > > When md updates a block on a RAID5 it will sometimes use a read-modify- > write > cycle which reads the old block and old parity, subtracts the old block > from > the parity block and then added the new block to the parity block. > Then it > writes the new data block and the new parity block. > > If the old parity was correct for the old stripe, then the new parity > will be > correct for the new stripe. But if the old was wrong then the new will > be > wrong. > > So if you use assume-clean then the parity may well be wrong and could > remain > wrong even when you write new data. If you then lose a device, the > data for > that device will be computed using wrong parity and you will get wrong > data - > hence data corruption. > > So you should only use --assume-clean if you know the array really is > 'clean'. > Thanks for the information guys. I was actually attempting to test whether this could occur with a high-level sequence similar to the following: - dd /dev/urandom data to 4 small partitions (~10MB each). - Create a raid5 with --assume-clean on said partitions. - Write a small bit of data (32 bytes) to the beginning of the md, capture an image of the md to a file. - Fail/remove a drive from the md, capture a second md file image. - cmp the file images to see what changed, and read back the first 32 bytes of data. In this scenario I do observe differences in the file image, but my data remains intact. I ran this sequence multiple times, each time failing a different drive in the array and also tried to stop/restart the array (with a drop_caches in between) before the drive failure step. This leads to my question: is there a write test that can reproduce data corruption under this scenario, or is the rmw cycle some kind of optimization that is not so deterministic? Also out of curiousity, would --assume-clean be safe on a raid5 if the drives were explicitly zeroed beforehand? Thanks again. Brian > RAID1/RAID10 cannot suffer from this so --assume-clean is quite safe > with > those array types. > The current implementation of RAID6 never does read-modify-write so > --assume-clean is currently safe with RAID6 too. However I do not > promise > that RAID6 might not change to use read-modify-write cycles in some > future > implementation. So I would not recommend using --assume-clean on RAID6 > just > to avoid the resync cost. > > NeilBrown > > > > > Stefan > > > > Am 06.08.2010 03:19, schrieb brian.foster@xxxxxxx: > > > Hi all, > > > > > > I've read in the list archives that use of --assume-clean on raid5 > > > (raid6?) is not safe assuming the member drives are not sync, but > it's > > > not clear to me as to why. I can see the content of an written > raid5 > > > array change if I fail a drive out of the array (created w/ > > > --assume-clean), but data that I write prior to failing a drive > remains > > > intact. Perhaps I'm missing something. Could somebody elaborate on > the > > > danger/risk of using --assume-clean? Thanks in advance. > > > > > > Brian > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux- > raid" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > ��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f