Greg Smith wrote:
You turn on direct I/O differently under Solaris then everywhere else,
and nobody has bothered to write the patch (trivial) and OS-specific
code to turn it on only when appropriate (slightly tricker) to handle
this case. There's not a lot of pressure on PostgreSQL to handle this
case correctly when Solaris admins are used to doing direct I/O tricks
on filesystems already, so they don't complain about it much.
I'm not sure that this will survive use of PostgreSQL on Solaris with
more users
on Indiana though. Which I'm hoping will happen
RPM of the drive. Seen it on UFS and ZFS, both seem to do the right
thing here.
But ZFS *is* smart enough to manage the cache, albeit sometimes with
unexpected
consequences as with the 2530 here http://milek.blogspot.com/.
You seem to feel that there is an alternative here that PostgreSQL
could take but doesn't. There is not. You either wait until writes
hit disk, which by physical limitations only happens at RPM speed and
therefore is too slow to commit for many cases, or you cache in the
most reliable memory you've got and hope for the best. No software
approach can change any of that.
Indeed I do, but the issue I have is that the problem is that some
popular operating
systems (lets try to avoid the flame war) fail to expose control of disk
caches and the
so the code assumes that the onus is on the admin and the documentation
rightly says
so. But this is as much a failure of the POSIX API and operating
systems to expose
something that's necessary and it seems to me rather valuable that the
application be
able to work with such facilities as they become available. Exposing the
flush cache
mechanisms isn't dangerous and can improve performance for non-dbms users of
the same drives.
I think manipulation of this stuff is a major concern for a DBMS that
might be
used by amateur SAs, and if at all possible it should work out of the
box on common
hardware. So far as I can tell, SQLServerExpress makes a pretty good
attempt
at it, for example It might be enough for initdb to whinge and fail if
it thinks the
disks are behaving insanely unless the wouldbe dba sets a
'my_disks_really_are_that_fast'
flag in the config. At the moment anyone can apt-get themselves a DBMS
which may
become a liability.
At the moment:
- casual use is likely to be unreliable
- uncontrolled deferred IO can result in almost DOS-like checkpoints
These affect other systems than PostgreSQL too - but would be avoidable
if the
drive cache flush was better exposed and the IO was staged to use it.
There's no
reason to block on anything but the final IO in a WAL commit after all,
and with
the deferred commit feature (which I really like for workflow engines)
intermediate
WAL writes of configured chunk size could let the WAL drives get on with it.
Admitedly I'm assuming a non-blocking write through - direct IO from a
background thread (process if you must) or aio.
There are plenty of cases where the so-called "lying" drives
themselves are completely stupid on their own regardless of operating
system.
With modern NCQ capable drive firmware? Or just with older PATA stuff?
There's
an awful lot of fud out there about SCSI vs IDE still.
James
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance