Re: performance for high-volume log insertion

Stephen Frost <sfrost@xxxxxxxxxxx> · Wed, 22 Apr 2009 08:44:44 -0400

* James Mansion (james@xxxxxxxxxxxxxxxxxxxxxx) wrote:
> Fine.  But like I said, I'd suggest measuring the fractional improvement  
> for this
> when sending multi-row inserts before writing something complex.  I  
> think the
> big will will be doing multi-row inserts at all.  

You're re-hashing things I've already said.  The big win is batching the
inserts, however that's done, into fewer transactions.  Sure, multi-row
inserts could be used to do that, but so could dropping begin/commits in
right now which probably takes even less effort.

> If you are going to  
> prepare then
> you'll need a collection of different prepared statements for different  
> batch sizes
> (say 1,2,3,4,5,10,20,50) and things will get complicated.  A multi-row  
> insert
> with unions and dynamic SQL is actually rather universal.

No, as was pointed out previously already, you really just need 2.  A
single-insert, and a batch insert of some size.  It'd be interesting to
see if there's really much of a performance difference between a
50-insert prepared statement, and 50 1-insert prepared statements.  If
they're both done in larger transactions, I don't know that there's
really alot of performance difference.

> Personally I'd implement that first (and it should be easy to do across  
> multiple
> dbms types) and then return to it to have a more complex client side with
> prepared statements etc if (and only if) necessary AND the performance
> improvement were measurably worthwhile, given the indexing and storage
> overheads.

storage overhead?  indexing overhead?  We're talking about prepared
statements here, what additional storage requirement do you think those
would impose?  What additional indexing overhead?  I don't believe we
actually do anything differently between prepared statements and
multi-row inserts that would change either of those.

> There is no point optimising away the CPU of the simple parse if you are
> just going to get hit with a lot of latency from round trips, and forming a
> generic multi-insert SQL string is much, much easier to get working as a  
> first
> step. Server CPU isn't a bottleneck all that often - and with something as
> simple as this you'll hit IO performance bottlenecks rather easily.

Ah, latency is a reasonable thing to bring up.  Of course, if you want
to talk about latency then you get to consider that multi-insert SQL
will inherently have larger packet sizes which could cause them to be
delayed in some QoS arrangements.

As I said, most of this is a re-hash of things already said.  The
low-hanging fruit here is doing multiple inserts inside of a
transaction, rather than 1 insert per transaction.  Regardless of how
that's done, it's going to give the best bang-for-buck.  It will
complicate the client code some, regardless of how it's implemented, so
that failures are handled gracefully (if that's something you care about
anyway), but as there exists some queueing mechanisms in rsyslog
already, hopefully it won't be too bad.

	Thanks,

		Stephen
Attachment:
signature.asc

Description: Digital signature