On Fri, 2010-06-18 at 13:52 +1000, Neil Brown wrote: > On Thu, 17 Jun 2010 12:47:00 +0200 > Heinz Mauelshagen <heinzm@xxxxxxxxxx> wrote: > > > On Thu, 2010-06-17 at 15:41 +1000, Neil Brown wrote: > > > On Wed, 16 Jun 2010 13:26:27 +0200 > > > Heinz Mauelshagen <heinzm@xxxxxxxxxx> wrote: > > > > > > > On Wed, 2010-06-16 at 09:45 +1000, Neil Brown wrote: > > > > > Hi Heinz, > > > > > > > > > > On Tue, 15 Jun 2010 15:23:26 +0200 > > > > > Heinz Mauelshagen <heinzm@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > Neil, > > > > > > > > > > > > for missing devices we (re)load the table with error mapped device(s) in > > > > > > their place rather than identifying them with a special char/string. > > > > > > Did you go for the later in order to avoid some MD hassle with error > > > > > > targets being accessed by the MD personalities? If not so, we can drop > > > > > > the special '-' char support to identify missing devices. > > > > > > > > > > When raid5 determines that a device has failed and so it will not write to it > > > > > again, it must be sure that the failure is record in the metadata before it > > > > > proceeds with that decision not to even try writing to the failed device. > > > > > > > > Yes, that's the mandatory order to avoid data loss. > > > > > > > > > Otherwise a crash/restart before the metadata was safely updated would result > > > > > in corruption. > > > > > > > > > > This means that it must be possible for user-space to explicitly tell the > > > > > raid5 that a device is known to be faulty. > > > > > Doing that through the constructor seems the natural way to do it. > > > > > > > > If /dev/mapper/error with an error target mapping would be used instead, > > > > would that cause consistency troubles to the MD personality if accessing > > > > those rather than the NULL device solution with the '-' argument you > > > > have now? > > > > If not so, we could drop the '-' support, which I'm aiming at. > > > > > > > > > > I would need to differentiate between /dev/mapper/error (where errors are > > > expected) and any other device (where errors are not expected). How do you > > > propose I do that? > > > > Is that differentiation mandatory for the MD personality at all? > > I thought we had already agreed that it was. No ;) The following is based on the assumption, that you're refering to MD metadata... > > If the metadata records that a device is working, then raid5 will not expect > an error and if it gets one it must wait for the metadata to be updated. > If the metadata records that a device is not working, then raid5 will never > bother even trying to write to that device. So that will allow for a /dev/mapper/error device instead of the '-' because like you say, RAID5 is not going to write to it anyway. And it shouldn't bother reading neither as long as the metadata tells it the drive is dead. > > So yes, and important distinction. > > I imagine that the dmevent handler would handle a failure by updating the > metadata, then dismantling the array and recreating it with the failed device > missing. Yes, it updates the upspace metadata (ie. for lvm2/dmraid). > Given that in some cases it would want to do this anyway to > introduce a spare, and as I think this is how dm-raid1 errors are handled, it > seems the logical approach. > I note that your dm-raid45 code uses a message to "allow_writes". Yes. > Using that could be racy unless you suspend the device before updating the > metadata, and if you do that you may as well use the suspend/resume sequence > as part of re-enabling writes. (???) The message can be used optionally vs. suspend/resume pairs for performance reasons to allow for max reads to go through. Of course it needs to be called in the proper sequence (same applies to suspend/resume) and then it's not racy. > > This is probably just a question of interface design, and we both naturally > prefer the one we came up with first :-) Well... ;-) > > I guess it just feels clumsy for the raid5 driver to have to try to read and > fail rather than simply being told up front not to bother trying. If I understand you correctly, the state changes in the RAID5 personalities caused by a bad drive declared dead would lead to remembering the bad drive persistently and they (the personalities RAID4/5/6) wouldn't even bother accessing it, so it won't make a difference to put an error mapped device in place of the dead one. > With md/raid5, a read failure will be followed by an attempt to write out the > good data in case it was a simple media error. Having it do that every time > you start a degraded array seems extra pointless.... Hrm, getting confused here: so it'll still try reading even though the MD metadata reflects it to be dead? /me trying to get twists sorted :) Regards, Heinz > > > > > > You'll get immediate and durable errors from the error target and hence > > you'd instantly drop the leg. > > > > > > > > > > > > > > > > > > > > > > I'm thinking about adding a ctr wrapper to allow us to keep the "raid45" > > > > > > ctr interfaces semantics (ie. the arguments) and the new interface to > > > > > > "raid456" if you don't have objections. That would be implemented by 2 > > > > > > targets being registered implemented in one module. > > > > > > > > > > I have no strong objections, though having two targets that do almost the > > > > > same things seems rather ugly. > > > > > > > > Keep in mind it'll only be another target_type structs, another > > > > raid_ctr() function plus another (de)registration of the respective > > > > target. The ctr functions for raid45 and raid456 can share moset of the > > > > code anyway. > > > > > > Sure. > > > > > > > > > > > > > > > > > Is there existing published code that uses the extra arguments to raid45? > > > > > > > > Yes, dmraid with a list of RAID5 supporting metadata format handlers. > > > > > > I just had a look at the latest dmraid from CVS and the only place that I can > > > find where a raid45 target is created is in lib/activate/activate.c in > > > _dm_raid45_bol. > > > This only uses arguments that my code understands. > > > What am I missing? > > > > I was arguing against breaking API compatibility. > > dm-raid45.c is being shipped by distros. > > > > Shipped by distros which all have an 'upstream first' policy, and are fully > aware of the possible consequences of not following it :-) > > I'm not really that fussed about the API. I don't like seeing APIs that I > think are badly designed go into Linux, but plenty of them do and one more > isn't going to make much difference. > > Approving an API is really a job for the dm maintainer. Is that Alasdair? > Having stated my case I'm happy to accept whatever he prefers. > > NeilBrown > > > Thanks, > > Heinz > > > > > Thanks, > > > NeilBrown > > > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel