On Thu, Dec 14, 2006 at 13:21:11 -0800, Ron Mayer <rm_pg@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > Bruno Wolff III wrote: > > On Thu, Dec 14, 2006 at 01:39:00 -0500, > > Jim Nasby <decibel@xxxxxxxxxxx> wrote: > >> On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: > >>> This appears to be changing under Linux. Recent kernels have write > >>> barriers implemented using cache flush commands (which > >>> some drives ignore, so you need to be careful). > > Is it true that some drives ignore this; or is it mostly > an urban legend that was started by testers that didn't > have kernels with write barrier support. I'd be especially > interested in knowing if there are any currently available > drives which ignore those commands. I saw posts claiming this, but no specific drives mentioned. I did see one post that claimed that the cache flush command was mandated (not optional) by the spec. > >>> In very recent kernels, software raid using raid 1 will also > >>> handle write barriers. To get this feature, you are supposed to > >>> mount ext3 file systems with the barrier=1 option. For other file > >>> systems, the parameter may need to be different. > > With XFS the default is apparently to enable write barrier > support unless you explicitly disable it with the nobarrier mount option. > It also will warn you in the system log if the underlying device > doesn't have write barrier support. I think there might be a similar patch for ext3 going into 2.6.19. I haven't checked a 2.6.19 kernel to make sure though. > > SGI recommends that you use the "nobarrier" mount option if you do > have a persistent (battery backed) write cache on your raid device. > > http://oss.sgi.com/projects/xfs/faq.html#wcache > > > >> But would that actually provide a meaningful benefit? When you > >> COMMIT, the WAL data must hit non-volatile storage of some kind, > >> which without a BBU or something similar, means hitting the platter. > >> So I don't see how enabling the disk cache will help, unless of > >> course it's ignoring fsync. > > With write barriers, fsync() waits for the physical disk; but I believe > the background writes from write() done by pdflush don't have to; so > it's kinda like only disabling the cache for WAL files and the filesystem's > journal, but having it enabled for the rest of your write activity (the > tables except at checkpoints? the log file?). Not exactly. Whenever you commit the file system log or fsync the wal file, all previously written blocks will be flushed to the disk platter, before any new write requests are honored. So journalling semantics will work properly. > > Note the use case for this is more for hobbiests or development boxes. You can > > only use it on software raid (md) 1, which rules out most "real" systems. > > > > Ugh. Looking for where that's documented; and hoping it is or will soon > work on software 1+0 as well. I saw a comment somewhere that raid 0 provided some problems and the suggestion was to handle the barrier at a different level (though I don't know how you could). So I don't belive 1+0 or 5 are currently supported or will be in the near term. The other feature I would like is to be able to use write barriers with encrypted file systems. I haven't found anythign on whether or not there are near term plans by any one to support that.