On Tue, 15 Mar 2011 07:28:00 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- > > owner@xxxxxxxxxxxxxxx] On Behalf Of NeilBrown > > Sent: Monday, March 14, 2011 10:54 PM > > To: Kwolek, Adam > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > > Neubauer, Wojciech > > Subject: Re: [PATCH 0/3] UT and error case changes > > > > On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek <adam.kwolek@xxxxxxxxx> > > wrote: > > > > > The following series implements 2 changes: > > > 1. Fix for unit tests failure. > > > UT suits 12 and 13 fails, when backup file cannot be opened > > > for grow operation (backup file exists already). > > > > Thanks - applied. > > > > > > > > > > 2. I've got proposal for handling/rollback metadata in error case. > > > Currently in case of error external metadata can remain in reshape > > state. > > > In some cases metadata can be automatically restored to initial state > > > (i.e. metadata during imsm container operation can be rolled back > > > when error occurs on first reshaped array before reshape is started). > > > For such cases, additional superswitch function can be introduced. > > > > > > Metadata shouldn't be rolled back in restart case. > > > I'm passing restart flag to abort function in Grow.c only, as this is > > general rule. > > > In the same way array reshape state is checked. > > > > > > This is proposal, so I've put no implementation in to imsm handler (no > > metadata update is created yet). > > > Please let me know your opinion. If you will like it, I'll fill out > > gaps in imsm code. > > > > I'm not sure.. it sounds like it might be a good idea, but I'd like to > > have > > some concrete examples to help me think about it. > > > > I've got this idea during correcting UT so, i.e.: > wrong/already existing/ backup file name passed by used. > Backup file verification/opening is when metadata is in reshape state already. > This error causes mdadm exit at reshape position == 0. It is too early position for reshape restart. > User has manually restore metadata information and this can be quite difficult. > > Metadata rollback can be possible for container operation if: > 1. external metadata case (checked in Grow.c) > 2. This is action on first array in container (check in metadata handler) > 2. md doesn't start reshape (checked in Grow.c) > 3. Other conditions/metadata specific (checked in Grow.c) > > We can also pass some additional status to reshape_container() to indicate that metadata is rolled back > and allow for container unfreeze in such case also. > This can allow mdadm to leave array/container in state that work is started from (hopefully ;)). > > BR > Adam > I think I see your point - there are some issues with some failure possibilities at start-up. We should of course do as much checking as possible before committing to the reshape, and I think we do. But it is still possible that something could go wrong. It would probably be good to take the first backup after freezing IO, but before updating the metadata to show that a reshape has started. That would ensure that a crash at any time will either be able to replay the backup and continue the reshape, or won't see the reshape at all. It might be a bit awkward to do that with the current code, I'm not sure. Another possibility is to at least write out a 'zeroed' backup file and arrange that when we restart and array, if the backup file is empty, then that is a clear sign that no reshape has started and ->update_super("_reshape_progress") could revert the reshape. Longer term: I want to eventually teach md to be able to revert a reshape that has already started. In that case we will need some sort of metadata method to record that reshape is going backwards - though it may end up looking like something we already have... NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html