Re: Shared buffers, db transactions commited, and write IO on Solaris

"Jim C. Nasby" <jim@xxxxxxxxx> · Wed, 18 Apr 2007 12:19:59 -0500

On Fri, Mar 30, 2007 at 11:19:09AM -0500, Erik Jones wrote:
> >On Fri, Mar 30, 2007 at 04:25:16PM +0200, Dimitri wrote:
> >>The problem is while your goal is to commit as fast as possible -  
> >>it's
> >>pity to vast I/O operation speed just keeping common block size...
> >>Let's say if your transaction modification entering into 512K -  
> >>you'll
> >>be able to write much more 512K blocks per second rather 8K per  
> >>second
> >>(for the same amount of data)... Even we rewrite probably several
> >>times the same block with incoming transactions - it still costs on
> >>traffic, and we will process slower even H/W can do better. Don't
> >>think it's good, no? ;)
> >>
> >>Rgds,
> >>-Dimitri
> >>
> >With block sizes you are always trading off overhead versus space
> >efficiency. Most OS write only in 4k/8k to the underlying hardware
> >regardless of the size of the write you issue. Issuing 16 512byte
> >writes has much more overhead than 1 8k write. On the light  
> >transaction
> >end, there is no real benefit to a small write and it will slow
> >performance for high throughput environments. It would be better to,
> >and I think that someone is looking into, batching I/O.
> >
> >Ken
> 
> True, and really, considering that data is only written to disk by  
> the bgwriter and at checkpoints, writes are already somewhat  
> batched.  Also, Dimitri, I feel I should backtrack a little and point  
> out that it is possible to have postgres write in 512byte blocks (at  
> least for UFS which is what's in my head right now) if you set the  
> systems logical block size to 4K and fragment size to 512 bytes and  
> then set postgres's BLCKSZ to 512bytes.  However, as Ken has just  
> pointed out, what you gain in space efficiency you lose in  
> performance so if you're working with a high traffic database this  
> wouldn't be a good idea.

Sorry for the late reply, but I was on vacation...

Folks have actually benchmarked filesystem block size on linux and found
that block sizes larger than 8k can actually be faster. I suppose if you
had a workload that *always* worked with only individual pages it would
be a waste, but it doesn't take much sequential reading to tip the
scales.
-- 
Jim Nasby                                            jim@xxxxxxxxx
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)