Markus Wanner wrote: > Hi, > > Martijn van Oosterhout wrote: >> And fsync better do what you're asking >> (how fast is just a performance issue, just as long as it's done). > > Where are we on this issue? I've read all of this thread and the one on > the lvm-linux mailing list as well, but still don't feel confident. > > In the following scenario: > > fsync -> filesystem -> physical disk > > I'm assuming the filesystem correctly issues an blkdev_issue_flush() on > the physical disk upon fsync(), to do what it's told: flush the cache(s) > to disk. Further, I'm also assuming the physical disk is flushable (i.e. > it correctly implements the blkdev_issue_flush() call). Here we can be > pretty certain that fsync works as advertised, I think. > > The unanswered question to me is, what's happening, if I add LVM in > between as follows: > > fsync -> filesystmem -> device mapper (lvm) -> physical disk(s) > > Again, assume the filesystem issues a blkdev_issue_flush() to the lower > layer and the physical disks are all flushable (and implement that > correctly). How does the device mapper behave? > > I'd expect it to forward the blkdev_issue_flush() call to all affected > devices and only return after the last one has confirmed and completed > flushing its caches. Is that the case? > > I've also read about the newish write barriers and about filesystems > implementing fsync with such write barriers. That seems fishy to me and > would of course break in combination with LVM (which doesn't completely > support write barriers, AFAIU). However, that's clearly the filesystem > side of the story and has not much to do with whether fsync lies on top > of LVM or not. > > Help in clarifying this issue greatly appreciated. > > Kind Regards > > Markus Wanner Well, AFAIK, the summary would be: 1) adding LVM to the chain makes no difference; 2) you still need to disable the write-back cache in IDE/SATA disks, for fsync() to work properly. 3) without LVM and with write-back cache enabled, due to current(?) limitations in the linux kernel, with some journaled filesystems (but not ext3 in data=write-back or data=ordered mode, I'm not sure about data=journal), you may be less vulnerable, if you use fsync() (or O_SYNC). "less vulnerable" means that all pending changes are commetted to disk, but the very last one. So: - write-back cache + EXT3 = unsafe - write-back cache + other fs = (depending on the fs)[*] safer but not 100% safe - write-back cache + LVM + any fs = unsafe - write-thru cache + any fs = safe - write-thru cache + LVM + any fs = safe [*] the fs must use (directly or indirectly via journal commit) a write barrier on fsync(). Ext3 doesn't (it does when the inode changes, but that happens once a second only). If you want both speed and safety, use a batter-backed controller (and write-thru cache on disks, but the controller should enforce it when you plug the disks in). It's the usual "Fast, Safe, Cheap: choose two". This is an interesting article: http://support.microsoft.com/kb/234656/en-us/ note how for all three kinds of disk (IDE/SATA/SCSI) they say: "Disk caching should be disabled in order to use the drive with SQL Server". They don't mention write barriers. .TM. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general