Re: performance for high-volume log insertion

Greg Smith <gsmith@xxxxxxxxxxxxx> · Tue, 21 Apr 2009 13:17:41 -0400 (EDT)

On Mon, 20 Apr 2009, david@xxxxxxx wrote:

while I fully understand the 'benchmark your situation' need, this isn't 
that simple.  in this case we are trying to decide what API/interface to 
use in a infrastructure tool that will be distributed in common distros 
(it's now the default syslog package of debian and fedora), there are so 
many variables in hardware, software, and load that trying to benchmark 
it becomes effectivly impossible.

From your later comments, you're wandering a bit outside of what you were 
asking about here.  Benchmarking the *query* side of things can be 
extremely complicated.  You have to worry about memory allocation, cold 
vs. warm cache, scale of database relative to RAM, etc.

You were asking specifically about *insert* performance, which isn't 
nearly as complicated.  There are basically three setups:

1) Disk/controller has a proper write cache.  Writes and fsync will be 
fast.  You can insert a few thousand individual transactions per second.

2) Disk/controller has a "lying" write cache.  Writes and fsync will be 
fast, but it's not safe for database use.  But since (1) is expensive and 
this one you can get for free jut by using a regular SATA drive with its 
write cache enabled, you can use this case as a proxy for approximately 
how (1) would act.  You'll still get a few thousand transactions per 
second, sustained writes may slow down relative to (1) if you insert 
enough that you hit a checkpoint (triggering lots of random I/O).

3) All write caches have been disabled because they were not 
battery-backed.  This is the case if you have a regular SATA drive and you 
disable its write cache because you care about write durability.  You'll 
get a bit less than RPM/60 writes/second, so <120 inserts/second with a 
typical 7200RPM drive.  Here batching multiple INSERTs together is 
critical to get any sort of reasonable performance.

In (3), I'd expect that trivia like INSERT vs. COPY and COPY BINARY vs. 
COPY TEXT would be overwhelmed by the overhead of the commit itself. 
Therefore you probably want to test with case (2) instead, as it doesn't 
require any additional hardware but has similar performance to a 
production-worthy (1).  All of the other things you're worried about 
really don't matter here; you can get an approximate measure of what the 
performance of the various INSERT/COPY schemes are that is somewhat 
platform dependant, but the results should be good enough to give you some 
rule of thumb suggestions for whether optimizations are significant enough 
to justify the coding effort to implement them or not.

I'm not sure whether you're familiar with all the fsync trivia here.  In 
normal syslog use, there's an fsync call after every write.  You can 
disable that by placing a "-" before the file name in /etc/syslog.conf The 
thing that is going to make database-based writes very different is that 
syslog's fsync'd writes are unlikely to leave you in a bad state if the 
drive lies about them, while database writes can.  So someone using syslog 
on a standard SATA drive isn't getting the write guarantee they think they 
are, but the downside on a crash is minimal.  If you've got a high-volume 
syslog environment (>100 lines/second), you can't support those as 
individual database writes unless you've got a battery-backed write 
controller.  A regular disk just can't process genuine fsync calls any 
faster than that.  A serious syslog deployment that turns fsync on and 
expects it to really do its thing is already exposed to this issue though. 
I think it may be a the case that a lot of people think they have durable 
writes in their configuration but really don't.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance