Search Postgresql Archives

Re: How to store "blobs" efficiently for small and large sizes, with random access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 19, 2022 at 5:05 PM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:
> On Wed, 2022-10-19 at 12:48 +0200, Dominique Devienne wrote:
> > On Wed, Oct 19, 2022 at 12:17 PM Andreas Joseph Krogh <andreas@xxxxxxxxxx> wrote:
> > > First advice, don't do it. We started off storing blobs in DB for “TX safety”
> > Not really an option, I'm afraid.
> You should reconsider.  Ruling out that option now might get you into trouble
> later.  Large Objects mean trouble.

Andreas, Ericson, Laurenz, thanks for the advice.
I'll be sure to discuss these concerns with the team.

We have other (bigger) data in the file system, albeit more of a
read-only nature though perhaps.
And this is an area I'm not familiar with how security is handled, so
I'll investigate it to see if a path
forward to externalize the largish blobs (currently destined to live
in the DB) is possible.
So I hope you can see I'm not dismissing what you guys are saying.

But before I finish this thread for now, I'd like to add that I
consider unfortunate a state of affairs where
NOT putting the data in the DB is the mostly agreed upon advice. It
IMHO points to a weak point of
PostgreSQL, which does not invest in those use-cases with large data,
perhaps with more file-system
like techniques. Probably because most of the large users of
PostgreSQL are more on the "business"
side (numerous data, but on the smaller sizes) than the "scientific"
side, which (too often) uses files and
files-in-a-file formats like HDF5.

FWIW, when Oracle introduced SecureFile blobs years ago in v11, it
represented a leap forward in
performance, and back then we were seeing them being 3x faster than LO
at GB sizes, if I recall correctly,
with throughput that challenged regular networked file-system like
NFS. That was over 10 years ago,
so who knows where we are now. And from the posts here, the issues
with large blobs may be more
related to backup/restore perhaps, than runtime performance.

Having all the data in the DB, under a single security model, is a big
win for consistency and simplicity.
And the fact it's not really possible now is a pity, in my mind. My
(probably uninformed) opinion on this
is the large blobs are handled just like other relational data, in
paged storage designed for smaller data.
I.e. file-like blobs are shoehorned into structures which are
inappropriate for them, and that a rethink
and redesign is necessary specifically for them, similar to the Oracle
SecureFile one of old.

I have similar gripes with SQLite, which is otherwise a fantastic
embedded DB. Just see how the
SQLite-based Fossil-SCM fails to scale for very large repo with big
(e.g. game) assets, and how it
similarly failed to scale in SVN a long time ago, to be replaced by a
forest-of-files (which GIT also uses).

DBs like PostgreSQL and SQLite should be better at this. And I hope
they get there eventually.
Sorry to turn a bit philosophical at this. It's not a critic per-se.
More of the personal musing of a
dev in this space for a long time. FWIW. Thanks, --DD






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux