På onsdag 19. oktober 2022 kl. 12:48:24, skrev Dominique Devienne <ddevienne@xxxxxxxxx>:
On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh <andreas@xxxxxxxxxx> wrote:
First advice, don't do it. We started off storing blobs in DB for “TX safety”Not really an option, I'm afraid., but backup/restore quickly became too cumbersome so we ended up moving all blobs out and only store reference in DB.This required us to make a “vacuum system” that cleans up the blob-storage regularly as ROLLBACK/crash can make it out of sync.Note the fact the data is spread in many mostly independent schemas.Might ease the backup/restore? I'm not much of a DBA though...
We chose storing as LO because with it, streaming large blobs (not using much memory) actually worked, with JDBC at least.
I'm in C++, with I believe efficient use of binary binds and results, and use of COPY as much as possible,so as good as it gets I think (that I know of, with libpq), in terms of performance. Haven't looked at libpq'snew in v14 pipeline mode yet though.Investigated Cursor vs Statement too, and it's a tradeoff between latency and throughput.Not sure how good JDBC drivers for PostgreSQL are. I'm doing my best to go fast with libpq.In any case, thanks for your input. But it's not really a question of "if". But of "how".Putting thousands of large blobs in the file system is a no go. Assuming the clientscan even see the file system the server sees. This is a 2-tier system, there's no mid-tierthat would somehow magically handle proper security and lifetime management of these blobs.Thanks, --DD
Ok, just something to think about; Will your database grow beyond 10TB with blobs? If so try to calculate how long it takes to restore, and comply with SLA, and how long it would have taken to restore without the blobs.
PS: Our blobstore is not “the file system”, but SeaweedFS.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963