Re: [TuxOnIce-users] Repeatable md OOPS on suspend, 2.6.39.4 and 3.0.3

NeilBrown <neilb@xxxxxxx> · Thu, 15 Sep 2011 05:31:39 +0200

On Thu, 15 Sep 2011 09:32:10 +1000 Nigel Cunningham <nigel@xxxxxxxxxxxx>
wrote:

> Hi.
> 
> Please try/review the attached patch.
> 
> The problem is that TuxOnIce adds a BUG_ON() to catch non-TuxOnIce I/O
> during hibernation, as a method of seeking to stop on-disk data getting
> corrupted by the writing of data that has potentially been overwritten
> by the atomic copy.
> 
> Stopping the md devices from being marked readonly is the right thing to
> do - if we don't resume, we want recovery to be run. If we do resume,
> they should still be in the pre-hibernate state.
> 
> Regards,
> 
> Nigel

This doesn't feel like the right approach to me.

I think the 'md' device *should* be marked 'clean' when it is clean to
avoid unnecessary resyncs.

It would almost certainly make sense to have a way to tell md 'hibernate
wrote to your device so things might have changed - you should check'.
Then md could look at the metadata and refresh any in-memory information
such as device failures and event counts.
After all if a device fails while writing out the hibernation image, we want
the hibernation to succeed (I assume) and we want md to know that the device
is failed when it wakes back up, and currently it won't.  So we really need
that notification anyway.

??

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature