Search Postgresql Archives

Re: ZFS prefetch considered evil?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/07/2009, at 8:39 PM, Alban Hertroys wrote:

On Jul 8, 2009, at 2:50 AM, Yaroslav Tykhiy wrote:

Hi All,

I have a mid-size database (~300G) used as an email store and running on a FreeBSD + ZFS combo. Its PG_DATA is on ZFS whilst xlog goes to a different FFS disk. ZFS prefetch was enabled by default and disk time on PG_DATA was near 100% all the time with transfer rates heavily biased to read: ~50-100M/s read vs ~2-5M/s write. A former researcher, I was going to set up disk performance monitoring to collect some history and see if disabling prefetch would have any effect, but today I had to find out the difference the hard way. Sorry, but that's why the numbers I can provide are quite approximate.

Due to a peak in user activity the server just melted down, with mail data queries taking minutes to execute. As the last resort, I rebooted the server with ZFS prefetch disabled -- it couldn't be disabled at run time in FreeBSD. Now IMAP feels much more responsive; transfer rates on PG_DATA are mostly <10M/s read and 1-2M/s write; and disk time stays way below 100% unless a bunch of email is being inserted.

My conclusion is that although ZFS prefetch is supposed to be adaptive and handle random access more or less OK, in reality there is plenty of room for improvement, so to speak, and for now Postgresql performance can benefit from its staying just disabled. The same may apply to other database systems as well.


Are you sure you weren't hitting swap?

A sceptic myself, I genuinely understand your doubt. But this time I was sure because I paid attention to the name of the device involved. Moreover, a thrashing system wouldn't have had such a disparity between disk read and write rates.

IIRC prefetch tries to keep data (disk blocks?) in memory that it fetched recently.

What you described is just a disk cache. And a trivial implementation of prefetch would work as follows: An application or other file/disk consumer asks the provider (driver, kernel, whatever) to read, say, 2 disk blocks worth of data. The provider thinks, "I know you are short- sighted; I bet you are going to ask for more contiguous blocks very soon," so it schedules a disk read for many more contiguous blocks than requested and caches them in RAM. For bulk data applications such as file serving this trick works as a charm. But other applications do truly random access and they never come back after the prefetched blocks; in this case both disk bandwidth and cache space are wasted. An advanced implementation can try to distinguish sequential and random access patterns, but in reality it appears to be a challenging task.

ZFS uses quite a bit of memory, so if you distributed all your memory to be used by just postgres and disk cache then you didn't leave enough space for the prefetch data and _something_ will be moved to swap.

I hope you know that FreeBSD is exceptionally good at distributing available memory between its consumers. That said, useless prefetch indeed puts extra pressure on disk cache and results in unnecessary cache evictions, thus making things even worse. It is true that ZFS is memory hungry and so rather sensitive to non-optimal memory use patterns. Useless prefetch wastes memory that could be used to speed up other ZFS operations.

If you're running FreeBSD i386 then ZFS requires some careful tuning due to the limits a 32-bit OS puts on memory. I recall ZFS not being very stable on i386 a while ago for those reasons, which has by now been fixed as far as possible, but it's not ideal (and it likely never will be).

I use FreeBSD/amd64 and I'm generally happy with ZFS on that platform.

You'll probably want to ask about this on the FreeBSD mailing lists as well, they'll know much better than I do ;)

Are you a local FreeBSD expert? ;-) Jokes apart, I don't think this topic has to do with FreeBSD as such; it is mostly about making the advanced technologies of Postgresql and ZFS go well together. Even ZFS developers admit that in database related applications exceptions from general ZFS practices and rules may be called for.

When I set up my next ZFS based Postgresql server, I think I'll play with the recordsize property of ZFS and see if setting it to PAGESIZE makes any difference.

Thanks,

Yar

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux