On Jul 9, 2009, at 3:53 AM, Yaroslav Tykhiy wrote:
On 08/07/2009, at 8:39 PM, Alban Hertroys wrote:
On Jul 8, 2009, at 2:50 AM, Yaroslav Tykhiy wrote:
IIRC prefetch tries to keep data (disk blocks?) in memory that it
fetched recently.
What you described is just a disk cache. And a trivial
implementation of prefetch would work as follows: An application or
other file/disk consumer asks the provider (driver, kernel,
whatever) to read, say, 2 disk blocks worth of data. The provider
thinks, "I know you are short-sighted; I bet you are going to ask
for more contiguous blocks very soon," so it schedules a disk read
for many more contiguous blocks than requested and caches them in
RAM. For bulk data applications such as file serving this trick
works as a charm. But other applications do truly random access and
they never come back after the prefetched blocks; in this case both
disk bandwidth and cache space are wasted. An advanced
implementation can try to distinguish sequential and random access
patterns, but in reality it appears to be a challenging task.
Ah yes, thanks for the correction, I now remember reading about that
before. Makes the name 'prefetch' that more fitting, doesn't it?
And as you say, it's not that useful a feature with random access
(hadn't thought about that); in fact, I can imagine that it might
delay moving the disk-heads to the next desired (random) position as
the FS is still requesting data that it isn't going to be needing
(except for some lucky cases) - unless it manages to detect the
randomness of the access patterns. You can't predict randomness from
just read requests of course, you don't know about the requests that
are still to come. You can however assume something like that is the
case if historic requests turned out to be random by nature, but then
you'd want to know for which area of the FS this is the case.
I don't know how you partitioned your zpools, but to me it seems like
it'd be preferable to have the PostgreSQL tablespaces (and possibly
other data that's likely to be accessed randomly) in a separate zpool
from the rest of the system so you can restrict disabling prefetch to
just that file-system. You probably already did that...
It could be interesting to see how clustering the relevant tables
would affect the prefetch performance, I'd expect disk access to be
less random that way. It's probably still better to disable prefetch
though.
ZFS uses quite a bit of memory, so if you distributed all your
memory to be used by just postgres and disk cache then you didn't
leave enough space for the prefetch data and _something_ will be
moved to swap.
I hope you know that FreeBSD is exceptionally good at distributing
available memory between its consumers. That said, useless prefetch
indeed puts extra pressure on disk cache and results in unnecessary
cache evictions, thus making things even worse. It is true that ZFS
is memory hungry and so rather sensitive to non-optimal memory use
patterns. Useless prefetch wastes memory that could be used to
speed up other ZFS operations.
Yes, I do know that, it's one of the reasons I prefer it over other
OSs. The keyword here was 'available memory' though, under the
assumption that something was hitting swap. But apparently that wasn't
the case.
You'll probably want to ask about this on the FreeBSD mailing lists
as well, they'll know much better than I do ;)
Are you a local FreeBSD expert? ;-) Jokes apart, I don't think this
topic has to do with FreeBSD as such; it is mostly about making the
advanced technologies of Postgresql and ZFS go well together. Even
ZFS developers admit that in database related applications
exceptions from general ZFS practices and rules may be called for.
I wouldn't call myself an expert, I just use it on a few systems at
home and am more a user than an administrator. I do read the stable/
current mailing lists though (since 2004 according to my mail client)
and keep an eye on (among others) the ZFS discussions as I feel
tempted to change my gmirrors into zpools some day. It certainly looks
like an interesting FS, very flexible and reliable.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4a55e49a10131296212767!
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general