Re: Table Design for Many Updates

"Craig Boucher" <craig@xxxxxxxxxx> · Tue, 10 Jan 2017 14:30:28 -0800

No, they aren’t connecting directly to the database.  They go through a web api.  The data is queued on the device and uploaded in the background.  So a little latency isn’t a problem.  I’ve looked at Amazon’s SQS but I don’t like the idea of being tied to a specific hosting vendor.

Thanks,
Craig

From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of John R Pierce
Sent: Tuesday, January 10, 2017 1:48 PM
To: pgsql-general@xxxxxxxxxxxxxx
Subject: Re:  Table Design for Many Updates

On 1/10/2017 1:42 PM, David G. Johnston wrote:
On Tue, Jan 10, 2017 at 2:33 PM, Craig Boucher <craig@xxxxxxxxxx> wrote:
I have a multi-tenant database that I'm migrating from SQL Server to PostgreSQL 9.6.1.  I read the recent articles about the potential write amplification issue in Postgres.  I have one particular table that has 14 columns, a primary key, five foreign keys, and eight indexes.  We have a little over a thousand devices (this number will increase over time) on the Internet that will insert a row into this table and then proceed to update two columns in that row about once a minute for the next two hours.  The two columns are NOT NULL and are not FK or indexed columns.  I've thought about moving them to a one-to-one related table.  Any thoughts on if this is a wise move or if I'm making a mountain out of a mole hill?  It looks like this scenario would be covered by the Heap-Only-Tuple update but with over a hundred updates to the same row and over a thousand different rows being updated at a time, will I reap the benefits?

I do hope those 1000-and-growing devices aren't directly connecting to sql, but are instead talking to some sort of app service which queues up requests to a sane number of database worker processes where said worker process count can be tuned for to balance throughput and response ?

 With a reasonable fill-factor on the table you probably would be OK - but I'm partial to separating out the static and dynamic data into separate tables if the rest of the model and intended applications support it.  The main concern is how many queries do you have with a WHERE clause that includes fields from both sets?  Cross-table statistical estimates are problematic but if you don't have to be concerned about them it would be conceptually cleaner to setup a one-to-one here.

if these two updated fields are not indexed, its unlikely they're being used in WHERE clauses
the real question with HOT is if the table can be vacuumed frequently enough so the tuples can continue to be updated in the same block.

-- 
john r pierce, recycling bits in santa cruz