Re: working around JSONB's lack of stats?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/15 3:08 PM, Josh Berkus wrote:
On 01/30/2015 05:34 PM, Jim Nasby wrote:
On 1/30/15 2:26 PM, Josh Berkus wrote:
This would probably work because there aren't a lot of data structures
where people would have the same key:value pair in different locations
in the JSON, and care about it stats-wise.  Alternatetly, if the same
key-value pair appears multiple times in the same sample row, we could
cut the MC% by that multiple.

Even if there were multiple occurrences, this would probably still be an
improvement.

Another idea... at one time in the past when discussing statistics on
multiple columns, one idea was to build statistics on indexes. If we
built that, we could also do the same thing for at least JSONB (not sure
about JSON). Obviously doesn't help for stuff you haven't indexed, but
presumably if you care about performance and have any significant size
of data you've also indexed parts of the JSON, yes?

I'm not clear on what you're suggesting here.  I'm discussing how the
stats for a JSONB field would be stored and accessed; I don't understand
what that has to do with indexing.

The JSON problem is similar to the problem of doing multi-column statistics: there's no way to simply try to keep statistics on all possible combinations because that's something that's can be extremely large.

Something that's been proposed is only trying to keep multi-column stats on column combinations that we have an index on (which in a way is really just keeping stats on the index itself).

If we built that, we could use the same technique for JSON by simply defining the indexes you needed for whatever you were searching on.

Obviously that's not as ideal is simply keeping full statistics on everything in a JSON document, but it might be a heck of a lot easier to accomplish.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux