Re: Postgresql 9.4 and ZFS?

Joseph Kregloh <jkregloh@xxxxxxxxxxxxxx> · Thu, 1 Oct 2015 21:04:34 -0400

On Thu, Oct 1, 2015 at 5:51 PM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 10/1/15 8:50 AM, Joseph Kregloh wrote:

In my testing with pgbench I actually saw a decrease in performance with

a ZIL enabled. I ended up just keeping the L2ARC and dropping the. ZIL

will not provide you with any speed boost as a database. On a NAS with

NFS shared for example, a ZIL would work well. ZIL is more for data

protection than anything.

I run in Production FreeBSD 10.1 with an NVMe mirror for L2ARC, the rest

of the storage is spinning drives. With a combination of filesystem

compressions. For example, archival tablespaces and the log folder are

on gzip compression on an external array. Faster stuff like the xlog are

lz4 and on an internal array.

I'm not a ZFS expert, but my understanding is that a ZIL *that has lower latency than main storage* can be a performance win. This is similar to the idea of giving pg_xlog it's own dedicated volume so that it's not competing with all the other IO traffic every time you do a COMMIT.

Recent versions of Postgres go to a lot of trouble to make fsync as painless as possible, so a ZIL might not help much in many cases. Where it could still help is if you're running synchronous_commit = true and you consistently get lower latency on the ZIL than on the vdev's; that will make every COMMIT run faster.

(BTW, this is all based on the assumption that ZFS treats fsync as a synchronous request.)

The ZIL or ZFS Intent Log as the name describe is just a log. It just replays transactions that may have been lost in the event of machine failure. If the machine crashes upon startup of ZFS it will replay the data stored in the ZIL drive and try to fix any errors. During runtime the ZIL is never read from only written to.

When there is no separate ZIL device. With a synchronous write ZFS will store the data on RAM and the ZIL residing on the vdev. Once it acknowledges that the data is all there it will flush from RAM into it's final write location on the vdev. 

When you have a fast ZIL device like an SSD or NVMe drive. It will do the same store the data on RAM and on the fast ZIL device. Once acknowledge it will also write from RAM into the vdev. In theory it does give you a faster acknowledgement time.

In either case you are still "bottlenecked" by the speed of the write from RAM to the zpool. Now for a small database with not many writes a ZIL would be awesome. But on a write heavy database you will be acknowledging more writes because of the ZIL that what you are physically able to write from RAM to zpool, thereby degrading performance.

At least this is how it works in my head.

-Joseph Kregloh

-- 

Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX

Experts in Analytics, Data Architecture and PostgreSQL

Data in Trouble? Get it in Treble! http://BlueTreble.com