Re: Shared buffers, db transactions commited, and write IO on Solaris

Dimitri <dimitrik.fr@xxxxxxxxx> · Fri, 30 Mar 2007 15:14:35 +0200

You are right in that the page size constraint is lifted in that
directio cuts out the VM filesystem cache.  However, the Solaris
kernel still issues io ops in terms of its logical block size (which
we have at the default 8K).  It can issue io ops for fragments as
small as 1/8th of the block size, but Postgres issues its io requests
in terms of the block size which means that io ops from Postgres will
be in 8K chunks which is exactly what we see when we look at our
system io stats.  In fact, if any io request is made that isn't a
multiple of 512 bytes (the disk sector size), the file system
switches back to the buffered io.

Oh, yes, of course! yes, you still need to respect multiple of 512
bytes block size on read and write - sorry, I was tired :)

Then it's seems to be true - default XLOG block size is 8K, means for
every even small auto-committed transaction we should write 8K?... Is
there any reason to use so big default block size?...

Probably it may be a good idea to put it as 'initdb' parameter? and
have such value per database server?

Rgds,
-Dimitri

>
> However, to understand TX number mystery I think the only possible
> solution
> is to reproduce a small live test:
>
> (I'm sure you're aware you can mount/unmount forcedirectio
> dynamically?)
>
> during stable workload do:
>
>   # mount -o remount,logging  /path_to_your_filesystem
>
> and check if I/O volume is increasing as well TX numbers
> than come back:
>
>   # mount -o remount,forcedirectio  /path_to_your_filesystem
>
> and see if I/O volume is decreasing as well TX numbers...

That's an excellent idea and I'll run it by the rest of our team
tomorrow.

erik jones <erik@xxxxxxxxxx>
software developer
615-296-0838
emma(r)