On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh <andreas@xxxxxxxxxx> wrote:
First advice, don't do it. We started off storing blobs in DB for “TX safety”
Not really an option, I'm afraid.
, but backup/restore quickly became too cumbersome so we ended up moving all blobs out and only store reference in DB.
This required us to make a “vacuum system” that cleans up the blob-storage regularly as ROLLBACK/crash can make it out of sync.
Note the fact the data is spread in many mostly independent schemas.
Might ease the backup/restore? I'm not much of a DBA though...
We chose storing as LO because with it, streaming large blobs (not using much memory) actually worked, with JDBC at least.
I'm in C++, with I believe efficient use of binary binds and results, and use of COPY as much as possible,
so as good as it gets I think (that I know of, with libpq), in terms of performance. Haven't looked at libpq's
new in v14 pipeline mode yet though.
Investigated Cursor vs Statement too, and it's a tradeoff between latency and throughput.
Not sure how good JDBC drivers for PostgreSQL are. I'm doing my best to go fast with libpq.
In any case, thanks for your input. But it's not really a question of "if". But of "how".
Putting thousands of large blobs in the file system is a no go. Assuming the clients
can even see the file system the server sees. This is a 2-tier system, there's no mid-tier
that would somehow magically handle proper security and lifetime management of these blobs.
Thanks, --DD