Hi. I'd like some advice storing blobs (millions of them), ranging from very small, to large > 1GB. I know about bytea versus lo, and I have probably read most of what's out there about them :) Upfront, I have to state that I'm not keen on lo, because of security considerations. We store blobs in many different schemas, and users can access some schemas, and not others. So the fact the lo table is unique for the whole database would allow users to see blobs from any schema, as I understand it. Right? OTOH, lo has random access, which I also need... (I'm also not a fan of lo needing triggers for lifetime management) (nor of the requirement to have an explicit transaction to use lo). Here are the main requirement I need to fulfil: 1) store literally millions of rows, 1 "blob" per row. (scientific data). 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that. 3) yet many blobs are on the dozens of MBs. bytea's still OK. 4) but some blobs exceed the 1GB byte limit. Found at least a dozen just in our test data, so clients will have them. 5) for accessing larger blobs, the API I must implement accesses contiguous chunks of the blobs. Thus I need random access. I'm porting the backend of that API from Oracle to PostgreSQL. In Oracle we used RAW for smaller values, and SecureFile blobs for larger ones. Oracle LOBs are similar to lo I guess, providing random access, except higher performance for large sizes, but slower than bytea for smaller one. But a PITA to deal with, with locators, breaking pre-fetching, etc... PostgreSQL bytea is much better and simpler, except limited to 1GB... Bytea also has no direct random access, except via substr[ing], but how efficient and "random access" is that? For example, SQLite stores large values in overflow pages, and has to traverse all those pages to pointer-chase the chain of those for large blobs, so the substr() "random access" is NOT O(1) and involves way too much IO (i.e. O(N)). Do TOAST'ed large values (text or bytea) fare better in term of substr() efficiency, close to O(logN) and only the necessary IO for what's actually requested by the substr() (modulo the extrema 2K chunks of TOAST)? Some posts on the internet also peg lo to be faster above 20MB compared to bytea, for example. Is that even true? One post I read (from Daniel Verite I think) kinda hinted the lo table is not that different from TOAST ones, with 2K chunks, thus I don't see why lo would be any faster than a TOAST'ed bytea for example. Any insights from someone with knowledge of the internals can share on this? At this point, I have not yet redone the extensive benchmarking we did a few years ago between Oracle and PostgreSQL, which went into lo versus bytea then. And benchmark are hard to do, not having easy access to different kind of servers with different storage backends, or cloud-hosted PG. It's too easy to get biais from a local setup, leading to a design that'd not perform optimally in a different one. That's why I'm asking a more open-ended question to experts on this list. Because of #4 above, I need to either use lo (but see above, notably the security concern), or roll-up my own TOAST^2 (squared), as I call it, where I manually "shard" / chunk large blobs in an auxiliary table, itself TOAST'ed of course, with some threshold for chunks (e.g. 1 or 4 or 16 MB perhaps). The latter keeps the blobs in the schema (good for security), lifetime is managed by FKs (as usual), and the sharding limits the ill-effects of "emulating" random-access with substr() if necessary. I've already done things like this in SQLite land (also has the 1GB limit for its text and blob types). So is this a terrible idea? What alternatives do I have? I'd really appreciate some expert advice on the above, before I go too far down the rabbit hole. Thanks, --DD PS: Another drawback of lo is that because it's a single table, it still subject to the 32 TB limit on a relation. The 4TB limit per lo is way more than we need, but the 32 TB limit may actually be more of an issue for our larger clients, which have thousands of projects, each with upwards of a few millions of those blobs. bytea values being stored in different schemas (per-project, a design constraint), puts that limit per-project which will be more than enough. For the sum of all projects, maybe not... I.e. with real client-case of 3K projects, that puts an average of only 10GB of lo's per-project (i.e. schema), which could very well be problematic...