Re: POSIX file updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greg Smith wrote:
You turn on direct I/O differently under Solaris then everywhere else, and nobody has bothered to write the patch (trivial) and OS-specific code to turn it on only when appropriate (slightly tricker) to handle this case. There's not a lot of pressure on PostgreSQL to handle this case correctly when Solaris admins are used to doing direct I/O tricks on filesystems already, so they don't complain about it much.
I'm not sure that this will survive use of PostgreSQL on Solaris with more users
on Indiana though. Which I'm hoping will happen
RPM of the drive. Seen it on UFS and ZFS, both seem to do the right thing here.
But ZFS *is* smart enough to manage the cache, albeit sometimes with unexpected
consequences as with the 2530 here http://milek.blogspot.com/.
You seem to feel that there is an alternative here that PostgreSQL could take but doesn't. There is not. You either wait until writes hit disk, which by physical limitations only happens at RPM speed and therefore is too slow to commit for many cases, or you cache in the most reliable memory you've got and hope for the best. No software approach can change any of that.
Indeed I do, but the issue I have is that the problem is that some popular operating systems (lets try to avoid the flame war) fail to expose control of disk caches and the so the code assumes that the onus is on the admin and the documentation rightly says so. But this is as much a failure of the POSIX API and operating systems to expose something that's necessary and it seems to me rather valuable that the application be able to work with such facilities as they become available. Exposing the flush cache
mechanisms isn't dangerous and can improve performance for non-dbms users of
the same drives.

I think manipulation of this stuff is a major concern for a DBMS that might be used by amateur SAs, and if at all possible it should work out of the box on common hardware. So far as I can tell, SQLServerExpress makes a pretty good attempt at it, for example It might be enough for initdb to whinge and fail if it thinks the disks are behaving insanely unless the wouldbe dba sets a 'my_disks_really_are_that_fast' flag in the config. At the moment anyone can apt-get themselves a DBMS which may
become a liability.

At the moment:
- casual use is likely to be unreliable
- uncontrolled deferred IO can result in almost DOS-like checkpoints

These affect other systems than PostgreSQL too - but would be avoidable if the drive cache flush was better exposed and the IO was staged to use it. There's no reason to block on anything but the final IO in a WAL commit after all, and with the deferred commit feature (which I really like for workflow engines) intermediate
WAL writes of configured chunk size could let the WAL drives get on with it.
Admitedly I'm assuming a non-blocking write through - direct IO from a
background thread (process if you must) or aio.

There are plenty of cases where the so-called "lying" drives themselves are completely stupid on their own regardless of operating system.
With modern NCQ capable drive firmware? Or just with older PATA stuff? There's
an awful lot of fud out there about SCSI vs IDE still.

James


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux