Re: Postgresql 9.4 and ZFS?

Patric Bechtel <patric.bechtel@xxxxxxxxx> · Wed, 30 Sep 2015 15:45:43 +0200

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Tomas,

Tomas Vondra schrieb am 30.09.2015 um 14:01:
> Hi,
> 
> On 09/30/2015 12:21 AM, Patric Bechtel wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> Hi Benjamin,
>> 
>> if you're using compression, forget about that. You need to synchronize the ashift value to
>> the internal rowsize of you SSD, that's it. Make sure your SSD doesn't lie to you regarding
>> writing blocks and their respective order. In that case you might even choose to set
>> sync=disabled. Also, set atime=off and relatime=on. For faster snapshot transfers, you might
>> like to set the checksum algo to SHA256.
> 
> What is "SSD rowsize". Do you mean size of the internal pages?

Yep. In my experience, it helps write performance a lot. At least over extended period of time
(less write amplification).

> FWIW I've been doing extensive benchmarking of ZFS (on Linux), including tests of different
> ashift values, and I see pretty much no difference between ashift=12 and ashift=13 (4k vs 8k).
> 
> To show some numbers, these are pgbench results with 16 clients:
> 
> type      scale    ashift=12   ashift=13  rsize=8k   logbias 
> ---------------------------------------------------------------- ro        small        53097
> 53159     53696     53221 ro        medium       42869       43112     47039     46952 ro
> large         3127        3108     27736     28027 rw        small         6593        6301
> 6384      6753 rw        medium        1902        1890      4639      5034 rw        large
> 561         554      2168      2585
> 
> small=150MB, medium=2GB, large=16GB (on a machine with 8GB of RAM)
> 
> The tests are "adding" the features, i.e. the columns are actually:
> 
> * ashift=12 * ashift=13 * ashift=13 + recordsize=8kB * ashift=13 + recordsize=8kB +
> logbias=throughput
> 
> I've also done a few runs with compression, but that reduces the performance a bit
> (understandably).

I'm somewhat surprised by the influence of the rsize value. I will recheck that. In my case, the
compression actually improved throughput quite a bit, but that might change depending on CPU speed
vs IO speed. Our CPU's are quite powerful, but the SSD are just SATA Samsung/OCZ models at least
18 months old. Also, I measured the write performance over several hours, to push the internal gc
of the SSD to its limits. We had some problems in the past with (e.g. Intel) SSD's and their
behaviour (<1MB/s), so that's why I put some emphasis on that.

>> 
>> As always, put zfs.conf into /etc/modprobe.d with
>> 
>> options spl spl_kmem_cache_slab_limit=16384 options zfs zfs_arc_max=8589934592
>> 
>> you might want to adjust the zfs_arc_max value to your liking. Don't set it to more than 1/3
>> ofyour RAM, just saying.
> 
> Why? My understanding is that ARC cache is ~ page cache, although implemented differently and
> not as tightly integrated with the kernel, but it should release the memory when needed and
> such. Perhaps not letting it to use all the RAM is a good idea, but 1/3 seems a bit too
> aggressive?

First of all: The setting is somewhat 'disregarded' by zfs, as it's the net size of the buffer.
The gross side (with padding and aligning) isn't counted there, so in fact the cache fills up to
2/3 of the memory, which is plenty enough. Also, sometimes the arc shrinking process isn't as fast
as necessary, so leaving some headroom in case isn't a bad strategy, IMHO.

Patric
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlYL54cACgkQfGgGu8y7ypBXKACg6fuuvzdUtDvHRbdyisJXZwxF
ORMAoK3mEQhsB+AybHTQzhZ6hR6xT+30
=9yFi
-----END PGP SIGNATURE-----

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general