How to improve cockroach performance with pgbench?

Fabien COELHO <coelho@xxxxxxxxxxxx> · Wed, 29 Sep 2021 15:47:40 +0200 (CEST)

Hello,

I've been playing with CockroachDB, a distributed database system which is 
more or less compatible with Postgres because it implements the same 
network protocol. Because if this compatibility, I have used pgbench to 
setup and run some tests on various AWS VMs (5 identical VMs, going up to 
a total 80 vcpu in the system).

The general behavior and ease of use is great. Data are shared between 
nodes, adding a new node makes the system automatically replicate and 
balance the data, wow. Also, the provided web interface is quite nice and 
gives hints about what is happening. They implement an automatic retry 
feature so that when a transaction fails it is retried without the client 
needed to know about it.

All this is impressive, but performance wise I ran in a few issues and/or 
questions:

 - Loading data with a COPY (pgbench -i) is pretty slow, typically 3
   seconds per scale whereas on a basic postgres I would get 0.3 seconds
   per scale. Should I expect better performance, or is this the expected
   performance that can be achieved because of the automatic (automagic)
   replication performed by cockroach? Would it be better if I generated
   data from several connections (hmmm, pgbench does not know how to do
   that, but the tool could be improved if it is worth it)?

 - I'm at a loss at finding the right number of client connections to
   "maximise" tps under reasonable latency. Some of my tests suggest that
   maybe 4 clients per core is the best option. For a standard postgres,
   a typical client count would be larger, typically around 8-10 per
   core.
   Is this choice reasonable for cockroach?

 - The overall performance is a little bit disappointing. Ok, this is
   a distributed system which does automatic partitioning and replication
   on serializable transactions, so obviously this quality of service must
   cost something, but I'm typically running around 10 tps per core (with
   pgbench default transaction), so a pretty high latency, and even if
   it scales somehow, it which seems  quite low.
   What I am doing wrong? What should I check?

 - Another strange thing is that the steady state at full speed is quite
   unstable: looking at instantaneous performance, the tps varies a lot,
   eg between 0 and 4500 tps, more or less uniformly, i.e. the standard
   deviation is large, say 1000 tps stddev for a 2000 tps average
   performance.

Basically, any advice about cockroach configuration and running pgbench 
against it is welcome!

Thanks in advance,

--
Fabien.