Re: Postgresql 9.4 and ZFS?

Benjamin Smith <lists@xxxxxxxxxxxxxxxxxx> · Wed, 30 Sep 2015 10:33:18 -0700

On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:
> I think this really depends on the workload - if you have a lot of
> random writes, CoW filesystems will perform significantly worse than
> e.g. EXT4 or XFS, even on SSD.

I'd be curious about the information you have that leads you to this 
conclusion. As with many (most?) "rules of thumb", the devil is quite often  
the details. 

> > We've been running both on ZFS/CentOS 6 with excellent results, and
> > are considering putting the two together. In particular, the CoW
> > nature (and subsequent fragmentation/thrashing) of ZFS becomes
> > largely irrelevant on SSDs; the very act of wear leveling on an SSD
> > is itself a form of intentional thrashing that doesn't affect
> > performance since SSDs have no meaningful seek time.
> 
> I don't think that's entirely true. Sure, SSD drives handle random I/O
> much better than rotational storage, but it's not entirely free and
> sequential I/O is still measurably faster.
> 
> It's true that the drives do internal wear leveling, but it probably
> uses tricks that are impossible to do at the filesystem level (which is
> oblivious to internal details of the SSD). CoW also increases the amount
> of blocks that need to be reclaimed.
> 
> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x faster
> than ZFS. But of course, if the ZFS features are interesting for you,
> maybe it's a reasonable price.

Again, the details would be highly interesting to me. What memory optimization 
was done? Status of snapshots? Was the pool RAIDZ or mirrored vdevs? How many 
vdevs? Was compression enabled? What ZFS release was this? Was this on Linux, 
Free/Open/Net BSD, Solaris, or something else? 

A 2x performance difference is almost inconsequential in my experience, where 
growth is exponential. 2x performance change generally means 1 to 2 years of 
advancement or deferment against the progression of hardware; our current, 
relatively beefy DB servers are already older than that, and have an 
anticipated life cycle of at least another couple years.

// Our situation //
Lots of RAM for the workload: 128 GB of ECC RAM with an on-disk DB size of ~ 
150 GB. Pretty much, everything runs straight out of RAM cache, with only 
writes hitting disk. Smart reports 4/96 read/write ratio. 

Query load: Constant, heavy writes and heavy use of temp tables in order to 
assemble very complex queries. Pretty much the "worst case" mix of reads and 
writes, average daily peak of about 200-250 queries/second. 

16 Core XEON servers, 32 HT "cores". 

SAS 3 Gbps

CentOS 6 is our O/S of choice. 

Currently, we're running Intel 710 SSDs in a software RAID1 without trim 
enabled and generally happy with the reliability and performance we see. We're 
planning to upgrade storage soon (since we're over 50% utilization) and in the 
process, bring the magic goodness of snapshots/clones from ZFS. 

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general