On Wed, Oct 19, 2022 at 3:05 PM Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> wrote: > On 2022-Oct-19, Dominique Devienne wrote: > > OTOH, lo has random access, which I also need... > > Generally speaking, bytea sucks for random access, because if a TOAST > item is compressed, it has to be always read from the beginning in order > to decompress correctly. However, if you set > ALTER TABLE tab ALTER COLUMN byt SET STORAGE EXTERNAL > then compression is not used, and random access becomes fast. Thank you Álvaro. Quite insightful as well. I was aware of TOAST compression, but didn't make the connection to the side-effects on random-access. But now that TOAST has LZ4 support, which decompresses extremely fast, compared to ZLib (I have experience with LZ4 for a WebSocket-based server messages), and choosing an appropriately small shard/chunk size, that might be mitigated somewhat. Would need testing / benchmarking to compare uncompressed vs LZ4, at various chunk and subset/offset sizes, of course. Anybody has an answer to my question regarding how substr() works on bytea values? I.e. is it "pushed down" / optimized enough that it avoids reading the whole N-byte value, to then pass it to substr(), which then returns an M-byte value (where M < N)? If TOAST stores 2,000 chunks, and those chunks' PKs are numbers, simple arithmetic should be able to select only the chunks of interest, those incurring only the necessary IO for the selected range, no? Or the fact subsetting a bytea currently requires substr() prevents using such a scenario? And if so, why not support a native subsetting notation that did support that scenario, like the obvious bytea_col[offset, count] or bytea_col[start:end]? Seems to be me efficient native subsetting of varlength values would be quite valuable.