On 11/29/2016 01:50 AM, Thomas Güttler wrote:
Am 29.11.2016 um 01:52 schrieb Mike Sofen:
From: Thomas Güttler Sent: Monday, November 28, 2016 6:28 AM
...I have 2.3TBytes of files. File count is 17M
Since we already store our structured data in postgres, I think about
storing the files in PostgreSQL, too.
Is it feasible to store file in PostgreSQL?
-------
I am doing something similar, but in reverse. The legacy mysql
databases I’m converting into a modern Postgres data
model, have very large genomic strings stored in 3 separate columns.
Out of the 25 TB of legacy data storage (in 800
dbs across 4 servers, about 22b rows), those 3 columns consume 90% of
the total space, and they are just used for
reference, never used in searches or calculations. They range from 1k
to several MB.
Since I am collapsing all 800 dbs into a single PG db, being very
smart about storage was critical. Since we’re also
migrating everything to AWS, we’re placing those 3 strings (per row)
into a single json document and storing the
document in S3 bins, with the pointer to the file being the globally
unique PK for the row…super simple. The app tier
knows to fetch the data from the db and large string json from the S3
bins. The retrieval time is surprisingly fast,
this is all real time web app stuff.
This is a model that could work for anyone dealing with large objects
(text or binary). The nice part is, the original
25TB of data storage drops to 5TB – a much more manageable number,
allowing for significant growth, which is on the horizon.
Thank you Mike for your feedback.
Yes, I think I will drop my idea. Encoding binary (the file content) to
text and decoding to binary again makes no sense. I was not aware that
this is needed.
I guess I will use some key-to-blob store like s3. AFAIK there are open
source s3 implementations available.
Just be aware that doing deltas over file changes, like rsync, while
possible is more convoluted and time/resource consuming with something
like s3.
Thank you all for your feeback!
Regards, Thomas
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general