Re: Filesystem corruption on RAID1

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Thu, 17 Aug 2017 23:51:03 +0100

On 17/08/17 21:50, Gionatan Danti wrote:
> 
> It's more complex, actually. The hardware did not "lie" to me, as it
> correcly flushes caches when instructed to do.
> The problem is that a micro-powerloss wiped the cache *before* the drive
> had a chance to flush it, and the operating system did not detect this
> condition.

Except that that is not what should be happening. I don't know my hard
drive details, but I believe drives have an instruction "async write
this data and let me know when you have done so".

This should NOT return "yes I've flushed it TO cache". Which is how you
get your problem - the level above thinks it's been safely flushed to
disk (because the disk has said "yes I've got it"), but it then gets
lost because of your power fluctuation. It should only acknowledge it
*after* it's been flushed *from* cache.

And this is apparently exactly what cheap drives do ...

If the level above says "tell me when it's safely on disk", and the
drive truly does as its told, your problem won't happen because the disk
block layer will time out waiting for the acknowledgement and retry the
write.

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html