Re: Should disk write cache be disabled for any journalised filesystem?

Jari Ruusu <jariruusu@xxxxxxxxxxxxxxxxxxxxx> · Thu, 06 Oct 2005 18:04:59 +0300

petersen wrote:
> To my understanding, the danger is that the filesystem terminates an
> operation and updates the journal. The harddisk write cache somehow
> manage to write the updated journal info, but when about to write the
> filedata themselves, power is lost.

Yes, danger is in ordering of writes:
1) log intent of doing something dangerous
2) do dangerous operation
3) log dangerous operation completed

Write #2 or #3 must not hit disk platters before write #1, and write #3 must
not hit disk platters before write #2. If power is lost, journal replay on
next mount is able to fix partially completed operation.

The problem with enabled disk write cache is that the disk may say "write
complete" to kernel driver before the data hits disk platters, and after
that disk may re-order multiple pending writes.

> Wouldn't that be a general problem with any journalised filesystem?

Yes.

> If so, as most OS'es nowadays have journalised filesystems, does the
> modern harddisks have ways to prevent such problems, or does the harddisk
> (filesystem?) driver implement some 'sync'-function before commiting the
> journal?

To get write ordering right, kernel driver must issue cache flush command to
the disk, or in case where cache flush command is not available, cache
disable + cache enable command sequence may also flush pending writes.

That is what block I/O write barriers do. 2.6 kernels now support them on
some block devices. Device backed loop-AES driver maintains correct write
order and supports write barriers if underlying device supports write
barriers. Mainline loop driver supports neither barriers nor correct
ordering of writes.

> Perhaps I should clarify the question, I was thinking on journalising
> filesystems (ext3) on nonencrypted drives as well, eg. shouldn't
> _any_ ext3-user, also someone not using encryption, disable
> write-cache? Or is this case different because you have:
> 
> ext3
>   |
>   v
> loop
>   |
>   v
> physical driver, /dev/hda

Same write cache problems and solutions apply to both device backed loop-AES
and to ext3 file system directly on partition. Journaling file system on
file backed loop is FUBAR on both mainline and loop-AES versions.

> In the end it comes down to finding the risk of just using
> write-cache anyway, with loop-aes and ext3, I mean, ext3 &
> write-cache disk are probably what most people use today, and it
> seems to me even in the write-cache enabled case, ext3 loses far less
> data than ext2.

If a box has any data that is worth something, it probably has an UPS. On
UPS powered boxes, it is best to leave disk write caches enabled.

> On another topic: shouldn't the KEYSCRUB option be enabled by default?

Maybe. Performance cost is less than 1%. Keyscrub version needs to allocate
about twice the amount of RAM to hold expanded encryption keys: 40 KB for
normal multi-key for each initialized device, 76 KB for keyscrub version.
Not everyone likes that.

-- 
Jari Ruusu  1024R/3A220F51 5B 4B F9 BB D3 3F 52 E9  DB 1D EB E3 24 0E A9 DD

-
Linux-crypto:  cryptography in and on the Linux system
Archive:       http://mail.nl.linux.org/linux-crypto/