Re: Storing large documents - one table or partition by doc?

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Sat, 24 Sep 2016 16:30:06 -0500

On 9/24/16 6:33 AM, Dev Nop wrote:
This means that the applications are sensitive to the size of ids. A
previous incarnation used GUIDs which was a brutal overhead for large
documents.

If GUIDs *stored in a binary format* were too large, then you won't be 
terribly happy with the 24 byte per-row overhead in Postgres.

What I would look into at this point is using int ranges and arrays to 
greatly reduce your overhead:

CREATE TABLE ...(
  document_version_id int NOT NULL REFERENCES document_version
  , document_line_range int4range NOT NULL
  , document_lines text[] NOT NULL
  , EXCLUDE USING gist( document_version_id =, document_line_range && )
);

That allows you to store the lines of a document as an array of values, ie:

INSERT INTO ... VALUES(
  1
  , '[11-15]'
  , '[11:15]={line11,line12,line13,line14,line15}'
);

Note that I'm using explicit array bounds syntax to make the array 
bounds match the line numbers. I'm not sure that's a great idea, but it 
is possible.

My nightmares are of a future filled with hours of down-time caused by
struggling to restore a gargantuan table from a backup due to a problem
with just one tiny document or schema changes that require disconnecting
all clients for hours when instead I could ignore best practice, create
10k tables and process them iteratively and live in a utopia where I
never have 100% downtime only per document unavailability.

At some size you'd certainly want partitioning. The good news is that 
you can mostly hide partitioning from the application and other database 
logic, so there's not a lot of incentive to set it up immediately. You 
can always do that after the fact.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance