Re: Filesystem corruption on RAID1

Roger Heflin <rogerheflin@xxxxxxxxx> · Thu, 17 Aug 2017 07:41:00 -0500

On Thu, Aug 17, 2017 at 3:23 AM, Gionatan Danti <g.danti@xxxxxxxxxx> wrote:
> On 14/07/2017 12:46, Gionatan Danti wrote:> Hi, so a premature/preventive
> drive detachment is not a silver bullet,

> but is this the right solution)?
> - how to deal with this problem (other than being 100% sure power is never
> lost by any disks)?
>
> Thank you all,
> regards.
>

Here is a guess based on what you determined was the cause.

The mid-layer does not know the writes were lost.   The writes were in
the drives write cache (already submitted to the drive and confirmed
back to the mid-layer as done, even though they were not yet on the
platter), and when the driver lost power and "rebooted" those writes
disappeared, the write(s) the mid-layer had in progress and that never
got a done from the drive failed were retried and succeeded after the
driver reset was completed.

In high reliability raid the solution is to turn off that write cache,
*but* if you do direct io writes (most databases) with the drives
write cache off and no battery backed up cache between the 2 then the
drive becomes horribly slow since it must actually write the data to
the platter before telling the next level up that the data was safe.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html