On Tue, 24 Nov 2009, malahal@xxxxxxxxxx wrote: > I need to look at the code again, but I thought any new writes to a > failed region go to a surviving leg. In that case, we end up returning > I/O's to the application after writing to a single leg. Writes always go to all the legs, see do_write(). Anyway, dmeventd removes the failed leg soon. > > > Also, we do need to do the above work only if "primary" leg fails. We > > > can continue to work just like the old code if "secondary" legs fail, > > > right? Not sure if this is worth optimizing though, but I would like to > > > see it implemented as it is just a few extra checks. We can have > > > primary_failure field like log_failure field. > > > I thought about it too, but concluded that we need to hold bios even if > > the primary leg fails. > > > > Imagine this scenario: > > * secondary leg fails > > * write fails on the secondaty leg and succeeds on the primary leg > > and is successfully complete > > * the computer crashes > > * after a reboot, the primary leg is inaccessible and the secondary leg is > > back online --- now raid1 would be returning stale data. > > The software can detect this case. We can fail this completely or use > the data from the secondary that could be "stale" with help from admin. > Let us call this method 1. You can't detect it because the computer crashed *before* you write the information that the secondary leg failed to the metadata. So, after a reboot, you can't tell if any mirror leg failed some requests before the crash. > > If we hold the bios if the secondary leg fails (as the patch does), one of > > these two scenarios happen: > > > > * secondary leg fails > > * write succeeds on the primary leg and is held > > * the computer crashes > > * after a reboot, the primary leg is inaccessible and the secondary leg is > > back online --- but we haven't completed the write, so the transaction > > wasn't reported as committed > > > > or > > > > * secondary leg fails > > * write succeeds on the primary leg and is held > > * dmeventd removes the secondary leg and the write succeeds > > * the computer crashes > > * after a reboot, the primary leg is inaccessible, the secondary leg was > > already removed by dmeventd, so the array is considered inaccessible. So > > it doesn't work but at least it doesn't revert already committed > > transaction. > > How is this latter case (it doesn't need a crash anyway) > different/better from the case where we detect that 'primary' is missing > and ask admin if he wants to use the data on the secondary or not. At > least, the admin has a choice with "method 1" and this doesn't have that > choice. If you ask the admin always if primary leg failed and wait for his action, you lose fault-tolerance --- the computer would wait until the admin does an action. The requirements are: * if one of legs fail or log fails, you must automatically continue without human intervention * if both legs fail, you must shut it down and not pretend that something was written when it wasn't (this would break durability requirement of transactions). Mikulas > Thanks, Malahal. > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel