On Mon, Mar 1, 2010 at 11:36 PM, Luca Berra <bluca@xxxxxxxxxx> wrote: > On Tue, Mar 02, 2010 at 04:01:00PM +1100, Neil Brown wrote: >> >> On Sun, 28 Feb 2010 09:09:49 +0100 >> Luca Berra <bluca@xxxxxxxxxx> wrote: >> >>> On Thu, Feb 25, 2010 at 08:39:36AM +1100, Neil Brown wrote: >>> >On Wed, 24 Feb 2010 11:12:09 -0500 >>> >"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> wrote: >>> > >>> >> So realistically both disk blocks are wrong and there's a window until >>> >> the new, correct block is written. That window will only cause >>> >> problems >>> >> if there is a crash and we'll need to recover. My main concern here >>> >> is >>> >> how big the discrepancy between the disks can get, and whether we'll >>> >> end >>> >> up corrupting the filesystem during recovery because we could >>> >> potentially be matching metadata from one disk with journal entries >>> >> from >>> >> another. >>> > >>> >After a crash, md will only read from one of the devices (the first) >>> > until a >>> >resync has completed. So there should be no room for more confusion >>> > than you >>> >would expect on a single device. >>> >>> After thinking more about this i could come up with another concern >>> about write ordering. >>> >>> example >>> app writes block A, B, C >>> md writes A on both disks >>> md writes B on disk1 >>> app writes B again (B') >>> md writes B' on disk2 >>> now md would write B' again on both disks, but the system crashes >>> (note, C is never written due to crash) >>> >>> Disk 1 contains A and B in the correct order, it is missing C and B' but >>> we >>> dont care, app should be able to recover from a crash >>> >>> Disk 2 contains A and B', but they are wrongly ordered because C is >>> missing >>> >>> If in the above case A and C are data blocks and B contains a journal >>> related to A and C, booting from disk 2 could result in inconsistent >>> data. >>> >>> can the above really happen? >>> would using barriers remove the above concern? >>> am i missing something else? >> >> These is no inconsistency here that a filesystem would not equally expect >> from a single device. >> After the crash-while-writing B', it should expect to see either B or B', >> and it does, depending on which device is primary. >> >> Nothing to see here. > > I will try to explain better, > the problem is not related to the confusion between B or B' > > the problem is that on one disk we have B' _without_ C. > > Regards, > L. > > -- > Luca Berra -- bluca@xxxxxxxxxx > Communication Media & Services S.r.l. > /"\ > \ / ASCII RIBBON CAMPAIGN > X AGAINST HTML MAIL > / \ > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > You're demanding full atomic commits; this is precisely what journals and /barriers/ are for. Are you are bypassing them in a quest for performance and paying for it on crashes? Or is this a hardware bug? Or is it some glitch in the block device layering leading to barrier requests not being honored? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html