On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: > Well, we COULD keep the data in shared buffers, and then copy it into > an mmap()'d region rather than calling write(), but I'm not sure > there's any advantage to it. Managing address space mappings is a > pain in the butt. I could see this being a *theoretical* benefit in the case that the background writer gains the ability to write out all blocks associated with a file in order. In that case, you might get a win because you could get a single mmap of the entire file, and just wholesale memcpy blocks across, then sync/unmap it. This, of course assumes a few things that must be for it to be per formant: 0) a list of blocks to be written grouped by files is readily available. 1) The pages you write to must be in the page cache, or your memcpy is going to fault them in. With a plain write, you don't need the over-written page in the cache. 2) Now, instead of the torn-page problem being FS block/sector sized base, you can now actually have a possibly arbitrary amount of the block memory written when the kernel writes out the page. you *really* need full-page-writes. 3) The mmap overhead required for the kernel to setup the mappings is less than the repeated syscalls of a simple write(). All those things seem like something that somebody could synthetically benchmark to prove value before even trying to bolt into PostgreSQL. a. -- Aidan Van Dyk Create like a god, aidan@xxxxxxxxxxx command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance