On Fri, Mar 30, 2007 at 11:19:09AM -0500, Erik Jones wrote: > >On Fri, Mar 30, 2007 at 04:25:16PM +0200, Dimitri wrote: > >>The problem is while your goal is to commit as fast as possible - > >>it's > >>pity to vast I/O operation speed just keeping common block size... > >>Let's say if your transaction modification entering into 512K - > >>you'll > >>be able to write much more 512K blocks per second rather 8K per > >>second > >>(for the same amount of data)... Even we rewrite probably several > >>times the same block with incoming transactions - it still costs on > >>traffic, and we will process slower even H/W can do better. Don't > >>think it's good, no? ;) > >> > >>Rgds, > >>-Dimitri > >> > >With block sizes you are always trading off overhead versus space > >efficiency. Most OS write only in 4k/8k to the underlying hardware > >regardless of the size of the write you issue. Issuing 16 512byte > >writes has much more overhead than 1 8k write. On the light > >transaction > >end, there is no real benefit to a small write and it will slow > >performance for high throughput environments. It would be better to, > >and I think that someone is looking into, batching I/O. > > > >Ken > > True, and really, considering that data is only written to disk by > the bgwriter and at checkpoints, writes are already somewhat > batched. Also, Dimitri, I feel I should backtrack a little and point > out that it is possible to have postgres write in 512byte blocks (at > least for UFS which is what's in my head right now) if you set the > systems logical block size to 4K and fragment size to 512 bytes and > then set postgres's BLCKSZ to 512bytes. However, as Ken has just > pointed out, what you gain in space efficiency you lose in > performance so if you're working with a high traffic database this > wouldn't be a good idea. Sorry for the late reply, but I was on vacation... Folks have actually benchmarked filesystem block size on linux and found that block sizes larger than 8k can actually be faster. I suppose if you had a workload that *always* worked with only individual pages it would be a waste, but it doesn't take much sequential reading to tip the scales. -- Jim Nasby jim@xxxxxxxxx EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)