On Tue, 20 Nov 2007, Ian G Batten wrote: > On 20 Nov 07, at 1332, Michael R. Gettes wrote: > >> I am wondering about the use of fsync() on journal'd file systems >> as described below. Shouldn't there be much less use of (or very >> little use) of fsync() on these types of systems? Let the journal >> layer due its job and not force it within cyrus? This would likely >> save a lot of system overhead. > > fsync() forces the data to be queued to the disk. A journaling > filesystem won't usually make any difference, because no one wants to > keep an intent log of every 1 byte write, or the 100 overwrites of > the same block. If you want every write() to go to disk, > immediately, the filesystem layout doesn't really matter: it's just a > matter of disk bandwidth. Journalling filesystems are more usually > concerned with metadata consistency, so that the filesystem isn't > actively corrupt if the music stops at the wrong point in a directory > create or something. however a fsync on a journaled filesystem just means the data needs to be written to the journal, it doesn't mean that the journal needs to be flushed to disk. on ext3 if you have data=journaled then your data is in the journal as well and all that the system needs to do on a fsync is to write things to the journal (a nice sequential write), and everything is perfectly safe. if you have data=ordered (the default for most journaled filesystems) then your data isn't safe when the journal is written and two writes must happen on a fsync (one for the data, one for the metadata) for cyrus you should have the same sort of requirements that you would have for a database server, including the fact that without a battery-backed disk cache (or solid state drive) to handle your updates, you end up being throttled by your disk rotation rate (you can only do a single fsync write per rotation, and that good only if you don't have to seek), RAID 5/6 arrays are even worse, as almost all systems will require a read of the entire stripe before writing a single block (and it's parity block) back out, and since the stripe is frequently larger then the OS readahead, the OS throws much of the data away immediatly. if we can identify the files that are the bottlenecks it would be very interesting to see the result of puttng them on a solid-state drive. David Lang ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html