Re: [RFC] relaxed barrier semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James Bottomley, on 07/30/2010 03:28 AM wrote:
On Thu, 2010-07-29 at 19:04 -0400, Ted Ts'o wrote:
On Thu, Jul 29, 2010 at 04:30:54PM -0600, Andreas Dilger wrote:
Like James wrote, this is basically everything FUA.  It is OK for
ordered mode to allow the device to aggregate the normal filesystem
and journal IO, but when the commit block is written it should flush
all of the previously written data to disk.  This still allows
request re-ordering and merging inside the device, but orders the
data vs. the commit block.  Having the proposed "flush ranges"
interface to the disk would be ideal, since there would be no wasted
time flushing data that does not need it (i.e. other partitions).

My understanding is that "everything FUA" can be a performance
disaster.  That's because it bypasses the track buffer, and things get
written directly to disk.  So there is no possibility to reorder
buffers so that they get written in one disk rotation.  Depending on
the disk, it might even be that if you send N sequential sectors all
tagged with FUA, it could be slower than sending the N sectors
followed by a cache flush or SYNCHRONIZE_CACHE command.

I think we're getting into disk differences here.  This certainly isn't
correct for SCSI disks.  The standard enterprise configuration for a
SCSI disk is actually cache set to write through ... so FUA is a nop.
Even for Write Back cache SCSI devices, FUA is just a wait until I/O is
on media, which is pretty much equivalent to the write through case for
the given cache lines.

I can see the problems you describe possibly affecting ATA devices with
less sophisticated caches ... but, realistically, SATA and SAS devices
come from virtually the same manufacturing process ... I'd be really
surprised if they didn't share caching technologies.

Please, don't limit consideration to local disks only!

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux