Re: Suspected corruption on ACID databases due to no barrier support in ext3 on software raid-5 and hard resets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Because you are dealing with a database, data=ordered is not the right
mode to be using if you want the least chance of corruption.  Use
data=journal.

The problem as I understand it is that many databases use a small
number of very large files. When you are updating data in the
database, the data is not being written to new files but existing
files are being modified. If it were new files, then data=ordered
would be fine but since it is existing files being modified (in-place)
data=journal is more appropriate.

Of course, I might be wrong and would appreciate (polite) correction
but this is what I've been led to believe.


Now, due to the hardware reset, write caching on drives is a problem.
That you may have to disable - I've not found substantial performance
differences on when write caching is enabled anyway, but surely that
depends on a great many factors.


On Sat, Jun 28, 2008 at 3:53 AM, Leon Woestenberg
<leon.woestenberg@xxxxxxxxx> wrote:
> Hello all,
>
> we are quite sure we are hitting data corruption on a few % of cases
> on ACID* databases due to write caching enabled on drives in a
> software RAID-5 configuration with ext3 in default data=ordered mode.
>
> The machines are hard reset by a hardware watchdog when some esoteric
> PCI device misbehaves.
>
> We understand Linux software raid 5 does not pass-down barriers, is
> that correct, and is this being implemented?
>
> Also, our near-term direction of solution would be
> 0) disable write caches altogether, probably not feasible due to the
> performance regression involved.
> 1) solve the misbehave (cause of reset).
> 2) use a shorter timed software watchdog to trigger the drives into
> disabling their write caches, so that an imminent reboot has its
> commits ordered.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux