Re: POSIX file updates

James Mansion <james@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 03 Apr 2008 06:32:47 +0100

Greg Smith wrote:
You turn on direct I/O differently under Solaris then everywhere else, 
and nobody has bothered to write the patch (trivial) and OS-specific 
code to turn it on only when appropriate (slightly tricker) to handle 
this case. There's not a lot of pressure on PostgreSQL to handle this 
case correctly when Solaris admins are used to doing direct I/O tricks 
on filesystems already, so they don't complain about it much.
I'm not sure that this will survive use of PostgreSQL on Solaris with 
more users
on Indiana though. Which I'm hoping will happen
RPM of the drive.  Seen it on UFS and ZFS, both seem to do the right 
thing here.
But ZFS *is* smart enough to manage the cache, albeit sometimes with 
unexpected
consequences as with the 2530 here http://milek.blogspot.com/.
You seem to feel that there is an alternative here that PostgreSQL 
could take but doesn't.  There is not.  You either wait until writes 
hit disk, which by physical limitations only happens at RPM speed and 
therefore is too slow to commit for many cases, or you cache in the 
most reliable memory you've got and hope for the best.  No software 
approach can change any of that.
Indeed I do, but the issue I have is that the problem is that some 
popular operating
systems (lets try to avoid the flame war) fail to expose control of disk 
caches and the
so the code assumes that the onus is on the admin and the documentation 
rightly says
so.  But this is as much a failure of the POSIX API and operating 
systems to expose
something that's necessary and it seems to me rather valuable that the 
application be
able to work with such facilities as they become available. Exposing the 
flush cache
mechanisms isn't dangerous and can improve performance for non-dbms users of
the same drives.

I think manipulation of this stuff is a major concern for a DBMS that 
might be
used by amateur SAs, and if at all possible it should work out of the 
box on common
hardware.  So far as I can tell, SQLServerExpress makes a pretty good 
attempt
at it, for example It might be enough for initdb to whinge and fail if 
it thinks the
disks are behaving insanely unless the wouldbe dba sets a 
'my_disks_really_are_that_fast'
flag in the config. At the moment anyone can apt-get themselves a DBMS 
which may
become a liability.

At the moment:
- casual use is likely to be unreliable
- uncontrolled deferred IO can result in almost DOS-like checkpoints

These affect other systems than PostgreSQL too - but would be avoidable 
if the
drive cache flush was better exposed and the IO was staged to use it. 
There's no
reason to block on anything but the final IO in a WAL commit after all, 
and with
the deferred commit feature (which I really like for workflow engines) 
intermediate
WAL writes of configured chunk size could let the WAL drives get on with it.
Admitedly I'm assuming a non-blocking write through - direct IO from a
background thread (process if you must) or aio.

There are plenty of cases where the so-called "lying" drives 
themselves are completely stupid on their own regardless of operating 
system.
With modern NCQ capable drive firmware? Or just with older PATA stuff? 
There's
an awful lot of fud out there about SCSI vs IDE still.

James

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance