Mathieu Avila wrote:
Le Mon, 31 Mar 2008 11:54:20 +0100,
Steven Whitehouse <swhiteho@xxxxxxxxxx> a écrit :
Hi,
Hi,
Both GFS1 and GFS2 are safe from this problem since neither of them
use barriers. Instead we do a flush at the critical points to ensure
that all data is on disk before proceeding with the next stage.
I don't think this solves the problem.
I agree. Maybe this is one of the root causes of customer site
corruption reports we've seen in the past but were never able to figure
out why and/or duplicate.
However, without fully understanding how Linux IO layer, block device,
and/or volume manager handle this issue, it is difficult to comment your
patch. It is not proper to assume if it works on ext3, then it will work
on gfs1/2.
Don't rush - give people sometime to think about this problem. Or use
Netapp SAN, it has NVRAM and embedded logic to handle this :) ...
-- Wendy
Consider a cheap iSCSI disk (no NVRAM, no UPS) accessed by all my GFS
nodes; this disk has a write cache enabled, which means it will reply
that write requests are performed even if they are not really written
on the platters. The disk (like most disks nowadays) has some logic
that allows it to optimize writes by re-scheduling them. It is possible
that all writes are ACK'd before the power failure, but only a fraction
of them were really performed : some are before the flush, some are
after the flush.
--Not all blocks writes before the flush were performed but other
blocks after the flush are written -> the FS is corrupted.--
So, after the power failure all data in the disk's write cache are
forgotten. If the journal data was in the disk cache, the journal was
not written to disk, but other metadata have been written, so there are
metadata inconsistencies.
This is the problem that I/O barriers try to solve, by really forcing
the block device (and the block layer) to have all blocks issued before
the barrier to be written before any other after the barrier starts
begin written.
The other solution is to completely disable the write cache of the
disks, but this leads to dramatically bad performances.
Using barriers can improve performance in certain cases, but we've not
yet implemented them in GFS2,
Steve.
On Mon, 2008-03-31 at 12:46 +0200, Mathieu Avila wrote:
Hello all again,
More information on this topic:
http://lkml.org/lkml/2007/5/25/71
I guess the problem also applies to GFSS2.
--
Mathieu
Le Fri, 28 Mar 2008 15:34:58 +0100,
Mathieu Avila <mathieu.avila@xxxxxxxxxxxx> a écrit :
Hello GFS team,
Some recent kernel developements have brought IO barriers into the
kernel to prevent corruptions that could happen when blocks are
being reordered before write, by the kernel or the block device
itself, just before an electrical power failure.
(on high-end block devices with UPS or NVRAM, those problems
cannot happen)
Some file systems implement them, notably ext3 and XFS. It seems
to me that GFS1 has no such thing.
Do you plan to implement it ? If so, could the attached patch do
the work ? It's incomplete : it would need a global tuning like
fast_stafs, and a mount option like it's done for ext3. The code
is mainly a copy-paste from JBD, and does a barrier only for
journal meta-data. (should i do it for other meta-data ?)
Thanks,
--
Mathieu
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster