Re: JSONB performance enhancement for 9.6

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> · Sun, 29 Nov 2015 09:49:41 -0500

On Sun, 29 Nov 2015 08:24:12 -0500
Tom Smith <tomsmith1989sk@xxxxxxxxx> wrote:

> Hi, Thanks for everyone's response.
> 
> The issue is not just compression, but lack of "indexing" or "segmentation"
> when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
> 
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.

Sounds like you're pushing the limits of what JSONB is designed to do
(at this stage, at least). I'm not aware of any improvements in recent
versions (or head) that would do much to improve the situation, but I
don't track ever commit either. If you really need this improvement and
you're willing to wait for 9.6, then I suggest you check out the latest
git version and test on that to see if anything has been done.

I doubt you'll see much, though. As a thought experiment, the only way
I can think to improve this use case is to ditch the current TOAST
system and replace it with something that stores large JSON values in
a form optimized for indexed access. That's a pretty massive change
to some fairly core stuff just to optimize a single use-case of a
single data type. Not saying it won't happen ... in fact, all things
considered, it's pretty likely to happen at some point.

As far as a current solution: my solution would be to decompose the
JSON into an optimized table. I.e.:

CREATE TABLE store1 (
 id SERIAL PRIMARY KEY,
 data JSONB
);

CREATE TABLE store2 (
 id INT NOT NULL REFERENCES store1(id),
 top_level_key VARCHAR(1024),
 data JSONB,
 PRIMARY KEY(top_level_key, id)
);

You can then use a trigger to ensure that store2 is always in sync with
store1. Lookups can then use store2 and will be quite fast because of
the index. A lot of the design is conjectural: do you even still need
the data column on store1? Are there other useful indexes? etc. But,
hopefully the general idea is made clear.

This probably aren't the answers you want, but (to the best of my
knowledge) they're the best answers available at this time. I'd really
like to build the alternate TOAST storage, but I'm not in a position to
start on a project that ambitious right ... I'm not even really keeping
up with the project I'm currently supposed to be doing.

-- 
Bill Moran

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general